## On Interactive Machine Learning

When talking about machine learning, you may encounter many terminologies such as such as “online learning,” “active learning,” and “human in the loop” methods. Here are some of my thoughts on the relationship between interactive machine learning and machine learning in general. This is an extract from my answers to my comprehensive exam.

Traditionally machine-learning has been classified into supervised and unsupervised learning families. In supervised learning the training data, $\mathcal{D}$, consists of N sets of feature vectors each with a desired label provided by a teacher:

Training Set  $\hspace{10pt} \mathcal{D} = \{(\textbf{x}_i, y_i)\}_{i=1}^{N}$

where, $\textbf{x}_i \in \mathcal{X}$ is a d-dimensional feature vector

and $y_i \in \mathcal{Y}$ is the known label for it

The task is to learn a function, $f : \mathcal{X} \to \mathcal{Y}$, which can be used on unseen data.

In unsupervised learning, our data consists of vectors $\textbf{x}_i$, but no target label $y_i$. Common tasks under this category include clustering, density estimation and discovering patterns. A combination of these two is called semi-supervised learning, which has a mixture of labeled and unlabeled data in the training set. The algorithm assigns labels for missing data points using certain similarity measures.

While researchers are actively looking at improving the unsupervised learning techniques, supervised machine learning has been the dominant form of learning till date. However, traditional supervised algorithms assume that we have training data along with the labels readily available. They are not concerned with the process of obtaining the target values $y_i$s for the training dataset. Often, obtaining labelled data is one of the main bottlenecks in applying these techniques in domain specific applications. Often, obtaining labeled data is one of the main bottlenecks in applying these techniques in domain specific applications. Further, current approaches do not provide easy mechanisms for the end-users to correct problems when models deviate from the desired learning concept. NLP models are often built by experts in linguistics and/or machine learning, with limited or no scope for the end-users to provide input. Here the domain experts, or the end-users, provide input to models as annotations for a large batch of training data. This approach can be expensive, inefficient and even infeasible in many situations. This includes many problems in the clinical domain such as building models for analyzing EMR data.

“Human-in-the-loop” algorithms may be able to leverage the capabilities of a domain expert during the learning process. These algorithms can optimize their learning behavior through interaction with humans. Interactive Machine Learning (IML) is a subset of this class of algorithms. It is defined as the process of building machine learning models iteratively through end-user input. It allows the users to review model outputs and make corrections by giving feedback for building revised models. The users are then able to see model changes and verify them. This feedback loop (See Figure: Interactive Learning Loop) allows end-users to refine the models further with every iteration. Some early examples for this definition include applications in image segmentation, interactive document clustering, document retrieval, bug triaging and even music composition. You can read more about this in the article titled "Power to the People: The Role of Humans in Interactive Machine Learning" (Amershi et.al., 2014).

Interactive machine learning builds on a variety of styles of learning algorithms:

• Reinforcement Learning: In this class of learning we still want to learn $f : \mathcal{X} \to \mathcal{Y}$ but we see samples of $\textbf{x}_i$ but no target output $y_i$. Instead of $y_i$, we get a feedback from a critic about the goodness of the predicted output. The goal of the learner is to optimize for the reward function by selecting outputs that get best scores from the critics. The critic can be a human or any other agent. There need not be a human-in-the-loop for the algorithm to be classified under reinforcement learning. Several recent examples of this type include building systems that learn to play games such as Flappy Bird, Mario etc.
• Active Learning:  Active learning algorithms try to optimize for the number of training examples. Such an algorithm would ask an oracle to give labels such that it can achieve higher accuracy with smallest number of queries. These queries contain a batch of examples to be labelled. For example, in SVMs, one could select training sets for labeling that are closest to the margin hyperplanes to reduce the number of queries.
• Online Algorithms: Online learning algorithms are used when training data is available in sequential order, say due to the nature of the problem or memory constraints, as opposed to a batch learning technique where all the training data is available at once. The algorithm must adapt to the continuous stream of data made available to it. Formulating the learning problem to handle this situation forms the core of designing algorithms under this class.
A commonly used example would be the online gradient descent method for linear regression: Suppose we are trying to learn the parameters $\mathbf{w}$ for $f(\mathbf{x}) = w_0 + w_1x_1 + \ldots w_d x_d$. We update the weights when we receive the $i$th training example by taking the gradient of the defined error function:
$\mathbf{w}_{new} \leftarrow \mathbf{w} - \alpha \times \Delta_{\mathbf{w}} Error_i (\mathbf{w})$. Where, $\alpha$ is defined as the learning rate.

Interactive machine learning methods can include all or some of these learning techniques. The common property between all the interactive machine learning methods is the tight interaction loop between the human and the learning algorithm. Most of the effort in interactive machine learning has been about designing interactions for each step of this loop. My work on interactive clinical and legal text analysis also follows this pattern. You are welcome to check out those posts as well!

## References

1. Amershi et.al. (2014), Power to the People: The Role of Humans in Interactive Machine Learning. Available: https://www.microsoft.com/en-us/research/publication/power-to-the-people-the-role-of-humans-in-interactive-machine-learning/.

## Hey, I passed another exam!

Today, I have completed three years of having a blog. I took to blogging as a way to document my PhD experiences (and for learning to write :D). Though, it was very satisfying to see tens of thousands of visitors finding posts of their interest here.

As a coincidence I also passed my PhD comprehensive exam today and wanted to write-up a post to help future students understand these milestones. As a PhD student you take so many courses and exams, but you also need to pass a few extra special ones. Different departments and schools have their own requirements but the motivation behind having each of the milestones is similar.

ISP has three main exams on a way to PhD. You first finish all your coursework and take a preliminary exam, or prelims, with a 3-member committee of your choice. The goal here is to prove your ability to do original research by presenting the work you’ve done till then. At this point, you already have or are on your way toward your first publication in the program. After taking this exam and completing the coursework, you are eligible to receive your masters (or second masters) degree.

Next is the comprehensive exam (comps). The committee structure is similar to the prelims, but here you pick three topics related to your research and decide a member responsible for each. By working with your committee members, you prepare a reading list of recent publications, important papers and book chapters.

Each of the committee members will select a list of questions for you to answer. You get 9 days to answer these questions. It may be challenging to keep up with all the papers in the list if it has a lot of items. Usually it is a good idea to include those papers that you have referred to in your prior research work.

I immensely enjoyed this process and was reminded of the Illustrated guide to a PhD by Matt Might. Specially the one about “Reading research papers takes you to the edge of human knowledge”. If you haven’t seen those posts and intend to pursue a PhD, I would definitely recommend them.

Most of the questions in my exam were subjective, open-ended problems. Except the first one which made me wonder if I was interpreting it correctly. I guess, it was only there as a loosener [1] .

After you send in your written answers, you do an oral presentation in front of all three committee members. I was also asked a few follow-up questions based on my responses. Overall, it went smoothly and every one left pleased with my presentation.

## Footnotes

1. A term used in cricket for an easy first ball of the over ^

## On Clippy and building software assistants

I have been attending a reading group on visualization tools for the last few weeks. This is a unique multi-institution group that meets over web-conferencing at 4 PM EST / 1 PM PST on Fridays. It includes a diverse bunch of participants including non-academic researchers.

Every week we vote on and discuss a range of topics related to building tools for visualizing data.

This week, it was my turn to lead the discussion on the Lumiere paper. This is the research responsible for the now retired Clippy Office assistant. I also noticed a strong ISP presence in the references section as the paper focuses on Bayesian user modeling.

During the discussion, we talked about how we can offer help to use vis tools better. Here are my slides from it:

## References

1. Eric Horvitz, Jack Breese, David Heckerman, David Hovel, and Koos Rommelse. 1998. The lumière project: Bayesian user modeling for inferring the goals and needs of software users. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence (UAI’98), Gregory F. Cooper and Serafín Moral (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 256-265.
2. Justin Matejka, Wei Li, Tovi Grossman, and George Fitzmaurice. 2009. CommunityCommands: command recommendations for software applications. In Proceedings of the 22nd annual ACM symposium on User interface software and technology. ACM, New York, NY, USA, 193-202.

## Using Machine learning to help Manage Diabetes

I participated in the PennApps hackathon in Philadelphia this weekend. While most of the city was struck with a bad snow storm, a group of hackers holed up inside the Penn engineering buildings to work on some cool hacks. My team consisting of three other hackers: Daniel, Alex and Madhur, decided to work on an app that could predict blood glucose levels of diabetes patients by building machine learning models.

We used the OneTouch Reveal API to gather some data provided by the Johnson & Johnson’s company. They are the manufacturers of OneTouch glucose monitors for diabetes patients. They also give their patients an app for tagging events like exercise (light, moderate, heavy etc.), when they eat food and use insulin (different kinds – fast acting, before/after meals etc.). Our team thought that it might be a good idea to hack on this dataset to find out whether we could predict patients’ glucose levels without them having them to punch a hole in their fingers. A real world use case for this app would be to alert a patient when we predicted unusual glucose levels or have them do an actual blood test when the confidence on our predictions falls low.

We observed mixed results for the patients in our dataset. We did reasonably well for those with more data, but others had very few data points to make good predictions. We also saw that our predictions became more precise as we considered more data. Another issue was that the OneTouch API did not give sufficient information about food and exercise events for any of the patients – mostly without additional event tagging. As a result, our models were not influenced much by them.

We believe that in the near future, it would be common for the patients to have such monitors communicate with other wearable sensors such as smart watches. Such systems would be able to provide ample information about one’s physical activity etc., to make more meaningful predictions possible. Here’s a video demonstrating our proof-of-concept:

.

## Interactive Natural Language Processing for Legal Text

Update: We received the best student paper award for our paper at JURIX’15!

In an earlier post, I talked about my work on Natural Language Processing in the clinical domain. The main idea behind the project is to enable domain experts to build machine learning models for analyzing text. We do this by designing usable tools for NLP without really having the need to send datasets to machine learning experts or understanding the inner working details of the algorithms. The post also features a demo video of the prototype tool that we have built.

I was presenting this work at my program’s bi-weekly meetings where Jaromir, a fellow ISP graduate student, pointed out that such an approach could be useful for his work as well. Jaromir also holds a degree in Law and works on building AI systems for legal applications. As a result, we ended up collaborating on a project on using the approach for statutory analysis. While, the main topic of discussion in the project is on the framework in which a human experts cooperate with a machine learning text classification algorithm, we also ended up augmenting our approach with a new way of capturing and re-using knowledge. In our tool datasets and models are treated separately and our not tied together. So, if you were building a classification model for say statutes from the state of Alaska, when you need to analyze laws from Kansas you need not start from scratch. This allows us to be in a better starting place in terms of all the performance measures and build a model using fewer training examples.

We will be presenting this work at JURIX’15 during the 28th year of the conference focusing on legal information systems. Previously, we had presented portions of this work at the AMIA Summit on Clinical Research Informatics and at the ACM IUI Workshop on Visual Text Analytics.

## References

Jaromír Šavelka, Gaurav Trivedi, and Kevin Ashley. 2015. Applying an Interactive Machine Learning Approach to Statutory Analysis. In Proceedings of the 28th International Conference on Legal Knowledge and Information Systems (JURIX ’15). Braga, Portugal. [PDF] – Awarded the Best Student Paper (Top 0.01%).

## Machines learn to play Tabla

Update: This post now has a Part 2.

If you follow machine learning topics in the news, I am sure by now you would have come across Andrej Karpathy‘s blog post on The Unreasonable Effectiveness of Recurrent Neural Networks.[1] Apart from the post itself, I have found it very fascinating to read about the diverse applications that its readers have found for it. Since then I have spent several hours hacking with different machine learning models to compose tabla rhythms:

Although Tabla does not have a standardized musical notation that is accepted by all, it does have a language based on the bols (literally, verbalize in English) or the sounds of the strokes played on it. These bols may be expressed in written form which when pronounced in Indian languages sound like the drums. For example, the theka for the commonly used 16-beat cycle – Teental is written as follows:

Dha | Dhin | Dhin | Dha | Dha | Dhin | Dhin | Dha
Dha | Tin  | Tin  | Ta  | Ta  | Dhin | Dhin | Dha


For this task, I made use of Abhijit Patait‘s software – TaalMala, which provides a GUI environment for composing Tabla rhythms in this language. The bols can then be synthesized to produce the sound of the drum. In his software, Abhijit extended the tabla language to make it easier for users to compose tabla rhythms by adding a square brackets after each bol that specify the number of beats within which it must be played. You could also lay more emphasis on a particular bol by adding ‘+’ symbols which increased their intensity when synthesized to sound. Variations of standard bols can be defined as well based on different the hand strokes used:

Dha1 = Na + First Closed then Open Ge

Now that we are armed with this background knowledge, it is easy to see how we may attempt to learn tabla like a language model using Natural Language Processing techniques. Predictive modeling of tabla has been previously explored in "N-gram modeling of tabla sequences using variable-length hidden Markov models for improvisation and composition" (Avinash Sastry, 2011). But, I was not able to get access to the datasets used in the study and had to rely on the compositions that came with the TaalMala software.[2] This is comparatively a much smaller database than what you would otherwise use to train a neural network: It comprises of 207 rhythms with 6,840 bols in all. I trained a char-rnn and sampled some compositions after priming it with different seed text such as “Dha”, “Na” etc. Given below is a minute long composition sampled from my network. We can see that not only the network has learned the TaalMala notation but it has also understood some common phrases used in compositions such as the occurrence of the phrase “TiRa KiTa“, repetitions of “Tun Na” etc.:

Ti [0.50] | Ra | Ki | Te | Dha [0.50] | Ti [0.25] | Ra | Ki
| Ta | Tun [0.50] | Na | Dhin | Na
| Tun | Na | Tun | Na | Dha | Dhet | Dha | Dhet | Dha | Dha
| Tun | Na | Dha | Tun | Na | Ti | Na | Dha | Ti | Te | Ki |
Ti | Dha [0.50] | Ti [0.25] | Ra | Ki | Te | Dhin [0.50] |
Dhin | Dhin | Dha | Ge | Ne | Dha | Dha | Tun | Na | Ti
[0.25] | Ra | Ki | Ta | Dha [0.50] | Ti [0.25] | Ra | Ki |
Te | Dha [1.00] | Ti | Dha | Ti [0.25] | Ra | Ki | Te | Dha
[0.50] | Dhet | Dhin | Dha | Tun | Na | Ti [0.25] | Ra | Ki
| Ta | Dha [0.50] | Ti [0.25] | Ra | Ki | Te | Ti | Ka | Tra
[0.50] | Ti | Ti | Te | Na [0.50] | Ki [0.50] | Dhin [0.13]
| Ta | Ti [0.25] | Ra | Ki | Te | Tra | Ka | Ti [0.25] | Ra
| Ki | Te | Dhin [0.50] | Na [0.25] | Ti [0.25] | Ra | Ki |
Te | Tra | Ka | Dha [0.34] | Ti [0.25] | Ra | Ki | Ta | Tra
| Ka | Tra [0.50] | Ki [0.50] | Tun [0.50] | Dha [0.50] | Ti
[0.25] | Ra | Ki | Ta | Tra | Ka | Ta | Te | Ti | Ta | Kat |
Ti | Dha | Ge | Na | Dha | Ti [0.25] | Ra | Ki | Te | Dha
[0.50] | Dhin | Dhin | Dhin | Dha | Tun | Na | Ti | Na | Ki
| Ta | Dha [0.50] | Dha | Ti [0.50] | Ra | Ki | Te | Tun
[0.50] | Tra [0.25] | Ti [0.25] | Ra | Ki | Te | Tun | Ka |
Ti [0.25] | Ra | Ki | Te | Dha [0.50] | Ki [0.25] | Ti | Dha
| Ti | Ta | Dha | Ti | Dha [0.50] | Ti | Na | Dha | Ti
[0.25] | Ra | Ki | Te | Dhin [0.50] | Na | Ti [0.25] | Ra |
Ki | Te | Tra | Ka | Dha [0.50] | Ti [0.50] | Ra | Ki | Te |
Tun [0.50] | Na | Ki [0.25] | Te | Dha | Ki | Dha [0.50] |
Ti [0.25] | Ra | Ki | Te | Dha [0.50] | Ti [0.25] | Ra | Ki
| Te | Dha [0.50] | Tun | Ti [0.25] | Ra | Ki | Te | Dhin
[0.50] | Na | Ti [0.25] | Te | Dha | Ki [0.25] | Te | Ki |
Te | Dhin [0.50] | Dhin | Dhin | Dhin | Dha | Dha | Tun | Na
| Na | Na | Ti [0.25] | Ra | Ki | Ta | Ta | Ka | Dhe [0.50]
| Ti [0.25] | Ra | Ki | Te | Ti | Re | Ki | Te | Dha [0.50]
| Ti | Dha | Ge | Na | Dha | Ti [0.25] | Ra | Ki | Te | Ti |
Te | Ti | Te | Ti | Te | Dha [0.50] | Ti [0.25] | Te | Ra |
Ki | Te | Dha [0.50] | Ki | Te | Dha | Ti [0.25]

Here’s a loop that I synthesized by pasting a composition sampled 4 times one after the another:

Of course, I also tried training n-gram models and the smoothing methods using the SRILM toolkit. Adding spaces between letters is a quick hack that can be used to train character level models using existing toolkits. Which one produces better compositions? I can’t tell for now but I am trying to collect more data and hope to add updates to this post as and when I find time to work on it. I am not confident if simple perplexity scores may be enough to judge the differences between two models, specially on the rhythmic quality of the compositions. There are many ways in which one can extend this work. One there is a possibility of training on different kinds of compositions: kaidas, relas, laggis etc., different rhythm cycles and also from different gharanas. All of this would required collecting a bigger composition database:

And then there is a scope for allowing humans to interactively edit compositions at places where AI goes wrong. You could also use the samples generated by it as an infinite source of inspiration.

Finally, here’s a link to the work in progress playlist of the rhythms I have sampled till now.

## References

1. Avinash Sastry (2011), N-gram modeling of tabla sequences using variable-length hidden Markov models for improvisation and composition. Available: https://smartech.gatech.edu/bitstream/handle/1853/42792/sastry_avinash_201112_mast.pdf?sequence=1.

## Footnotes

1. If you encountered a lot of new topics in this post, you may find this post on Understanding natural language using deep neural networks and the series of videos on Deep NN by Quoc Le helpful. ^
2. On the other hand, Avinash Sastry‘s work uses a more elaborate Humdrum notation for writing tabla compositions but is not as easy to comprehend for tabla players. ^