Category Archives: Man vs. Machine

Machines learn to play Tabla, Part – 2

This is a followup on my earlier post on Machines Learn to play Tabla. You may wish it check it out first reading this one…

Three years ago, I published a post on using recurrent neural networks to generate tabla rhythms. Sampling music from machine learned models was not in vogue then. My post received a lot of attention on the web and became very popular. The project had been a proof-of-concept and I have wanted build on it for a long time now.

This weekend, I worked on making it more interactive and I am excited to share these updates with you. Previously, I was using a proprietary software to convert tabla notation to sound. That made it hard to experiment with sampled rhythms and I could share only a handful sounds. Taking inspiration from our friends at Vishwamohini, I am now able to convert bols into rhythm on the fly using MIDI.js.

Let me show off the new javascript synthesizer using a popular Delhi kaida. Hit the ‘play’ button to listen:

Now that you’ve heard the computer play, here’s an example of it being played by a tabla maestro:

Of course, the synthesized outcome is not much of a comparison to the performance by the maestro, but it is not too bad either…

Now to the more exciting part- Since our browsers have learned to play the tabla, we can throw in the char-rnn model that I built in the earlier post.  To do this, I used the RecurrentJS library and combined it with my javascript tabla player:

Feel free to play around with tempo and maximum character-limit for sampling. When you click on ‘generate’,  it will play a new rhythm every time. Hope you’ll enjoy playing with it as much as I did!

The player has a few kinks at this point I am working towards fixing them. You too can contribute to my repository on GitHub.

There are two areas that need major work:

Data: The models that I trained for my earlier post was done using a small amount of training data. I have been on a lookout for better dataset since then. I wrote a few emails, but without much success till now. I am interested in knowing about more datasets I could train my models on.

Modeling: Our model did a very good job of understanding the structure of TaalMala notations. Although character level recurrent neural networks work well, it is still based on very shallow understanding of the rhythmic structures. I have not come across any good approaches for generating true rhythms yet:

I think more data samples covering a range of rhythmic structures would only partially address this problem. Simple rule based approaches seem to outperform machine learned models with very little effort. Vishwamohini.com has some very good rule-based variation generators that you could check out.  They sound better than the ones created by our AI. After all the word for compositions- bandish, literally derived from ‘rules’ in Hindi. But on the other hand, there are only so many handcrafted rules that you can come up with before they starts sounding repetitive.

Contact me if you have some ideas and if you’d like to help out! Hope that I am able to post an update on this sooner than three years this time 😀

Machines understand Rahul Gandhi!

I have a (bad) habit of checking my Twitter feed while at work. Yesterday after my machine learning class, I found my timeline to be filled with Tweets mocking Rahul Gandhi about his first-ever television interview. Naturally, I was curious to know why and I tried to give it a listen. Most of his answers made no sense to me whatsoever! But then guess what? Who else is bad at responding to questions in natural language? The machines are! Maybe it was time to put them to a test and see if the machines could understand Mr. Gandhi. Making use of the transcript made available by the Times of India and some free NLP tools(ets), I spent a couple of hours (unproductive, ofcourse :P) trying to make sense of the interview.

Here’s a wordle summary of his answers, that would at least give you an overview about what was being spoken about during the interview:

Screen Shot 2014-01-28 at 4.36.43 pm
Such system. Many people. Wow! Apparently the word ‘system’ was used 70 times during the entire interview.

Here are some of the most used (best) words from the transcript. The number times they were used are mentioned in parenthesis.

  1. system (70)
  2. people (66)
  3. going (52)
  4. party (51)
  5. country (44)
  6. want (34)
  7. congress (34)
  8. power (32)
  9. political (31)
  10. issue (26)

Next, I set out to generate a summary of his answers. And lo! to my surprise, it made perfect sense (contrary to what you usually get from a summarizer). This is the summary generated from the online tool at http://freesummarizer.com/:

What I feel is that this country needs to look at the fundamental issues at hand, the fundamental political issue at hand is that our Political system is controlled by too few people and we absolutely have to change the way our political system is structured, we have to change our Political parties, we have to make them more transparent, we have to change the processes that we use to elect candidates, we have to empower women in the political parties, that is where the meat of the issue but I don’t hear that discussion, I don’t hear the discussion about how are we actually choosing that candidate, that is never the discussion.

That ascribes huge power to the Congress party, I think the Congress party’s strength comes when we open up when we bring in new people, that is historically been the case and that is what I want to do.

The Gujarat riots took place frankly because of the way our system is structured, because of the fact that people do not have a voice in the system. And what I want to do. He was CM when Gujarat happened The congress party and the BJP have two completely different philosophies, our attack on the BJP is based on the idea that this country needs to move forward democratically, it needs push democracy deeper into the country, it needs to push democracy into the villagers, it needs to give women democratic powers, it needs to give youngsters democratic powers.

You are talking about India, we have had a 1 hour conversation here, you haven’t asked me 1 question about how we are going to build this country, how we are going to take this country forward, you haven’t asked me one question on how we are going to empower our people, you haven’t asked me one question on what we are going to do for youngsters, you are not interested in that.

There is the Congress Party that believes in openness, that believes in RTI, that believes in Panchayati Raj, that believes in giving people power. The Congress party is an extremely powerful system and all the Congress party needs to do is bring in younger fresher faces in the election which is what we are going to do and we are going to win the election.

In retrospect, repeating a few points several times is a good enough cue for an auto-summarizer to identify important sentences. This interview was perfect for a task like this as Mr. Gandhi repeated the same set of (rote) answers for almost every question that he was asked. Perhaps this is what he was hoping for? To make sure that when lazy journalists use automatic tools to do their jobs, it would give them a perfect output!

Now coming to the interesting bit. If you were a human listener like me and wanted to read the answers that he really did attempt to answer [1] , what would you do? Fear not! I have built an SVM classifier from this transcript that you could make use of in future. I used LightSide, an open source platform created by CMU LTI researchers to understand features from the transcript of his answers. Let’s get into the details then.

When you go for a interview, you could either choose to answer a question or try to avoid by cleverly diverting from the main question asked. In Rahul’s case, we have answers that can be mainly grouped into three categories – a) the questions that he answered, b) he managed to successfully avoid and c) the LOL category (the answer bears no resemblance to the question asked). I combined categories (b) and (c) to come up with classes: ANSWERED or UNANSWERED. You may check out my list of classes here and read the interview answers from the Times of India article here. They follow the same order as in the transcript with the exception of single line questions-answers that would’ve otherwise served as noise for machine learning. I selected a total of 114 questions in all out which 45 were answered and the remaining 69 were either successfully avoided or belonged to the LOL category [2] .

For feature extraction, I used quite simple language features like Bigrams, Trigrams, Line length after excluding stop words etc. You can download them in the LightSide feature format. I used the SVM plugin to learning the classification categories from the feature. Here is the final model that the tool built using the extracted features. And the results were surprising (or probably not :). With 10-fold cross validation, the resulting model had an accuracy of over 72%! An accuracy percentage like this is considered to be exceptional (in case you are not familiar with the field). The machines indeed understand Rahul Gandhi!

Unfortunately, I did not have enough data to run a couple of tests separately. We’ll have to probably wait for Mr. Gandhi to give his next interview for that. Hope that the Congress party members work as hard as the NLP researchers so that we can have a good competition by then!

Footnotes

  1. He did make an effort to answer about 40% of the questions to his credit ^
  2. These are solely based on my personal opinion. ^

Talk: Socially Embedded Search

This week I attended a full house talk by Dr. Meredith Ringel Morris on Socially Embedded Search Engines. Dr. Morris put together a lot of material in her presentation and we (audience) could appreciate how she presented all of it, with great clarity, in just one hour. But I think it would tricky for me to summarize everything in a short post. Do check out Dr. Morris’ website to find out more information on the subject.

Social Search is term for when you pose a question to your friends by using one of the social networking tools (like Facebook, Twitter). There is good chance that you might have already been using “Social Search” without knowing the term for it. So, why would you want to do that instead of using regular search engines that you have access to? It may be simpler to ask your friends at times and they could also provide direct, reliable and personalized answers. Moreover, this is something that could work along with the traditional search engines. Dr. Morris’ work gives some insight into the areas where the search engineers have opportunities in combining traditional algorithmic approaches with social search. She tells us about what kind of questions are asked more in a social search and which types of them are more likely to succeed in getting a useful answer. She goes on further into how the topics for these questions vary with people from different cultures.

I really liked the part about “Search buddies” during the talk. In their paper, Dr. Morris and her colleagues have proposed implanting automated agents that post relevant replies to your social search queries. One type of such an agent tries to figure out the topic for the question and recommends friends who seem to be interested in that area by looking at their profiles. While another one would try to use an algorithmic approach and post a link to a web-page that is likely to contain an answer to the question. It was interesting to know more about how other people reacted to the involvement of these automated agents. While some of the people in the experiment appreciated being referred to for an answer, a lot of them found them obnoxious when they didn’t perform well in identifying the contexts. In her more recent work, Dr. Morris has tried to solve these problems by recruiting real people from Mechanical Turk to answer questions on Twitter. Such an approach could respond to people’s questions in a smarter way by collecting information from a several people. It could then respond to these questions in the form of a polling result and quote the number of people recommending a particular answer. It can also work by taking into account any other replies that the participant would have already received from one of his followers. The automated agent would then present that answer for a opinion poll from the Turkers. Although such a system could provide more intelligent replies than ‘dumb’ algorithms but it may still fail in comparison to responses from your friends which would certainly be more personalized and placed better contextually. During the QnA session, one of audience members raised a question (with a follow-up question by Prof. Kraut)  about comparing these methods with question-and-answer websites such as Quora. While these sites may not provide as personalized results but will certainly do better in drawing the attention of people interested in similar topics. It may not be always possible to find somebody amongst your friends, to answer question on a specialized  topic.

Dr. Morris’ talk provided some really good backing for some of the recent steps taken by search engines like Bing (having ties with both Twitter and Facebook), Google (and the Google plus shebang) and also Facebook (with Graph Search) in this direction. It would be interesting to see how social computing research shapes the future of internet search.

Further Reading

You can find Dr. Morris’ publications on this topic here: http://research.microsoft.com/en-us/um/people/merrie/publications.html

How about collaboration?

My previous post on Computers and Chess, serves as a good prologue to this one.

watson
That’s me geeking out at the Jeopardy stage setup.

A little more than two years ago, the IBM Watson played against and defeated the previous champions of Jeopardy!, the TV game show in which the contestants are tested on their general knowledge with quiz-style questions.[1] I remember being so excited while watching this episode that I ended up playing it over and over again, only to have the Jeopardy jingle loop in my head for a couple of days! Now, this is a much harder challenge for the computer scientists to solve than making a machine play chess.

Computers have accomplished so many things that we thought that only humans could do (play chess and jeopardy, drive a car all by itself …). While these examples are by no means small problems that we have solved, we still have a long way to go. While it can solve problems that we as humans often find difficult (such as playing chess, calculating 1234567890 raised to the power 42 etc.), it cannot* do a lot of things that you and I take for granted. For example, it can’t comprehend this post as well as you do (Watson may not be able to answer everything), read it out naturally & fluently (Siri still sounds robotic) and make sense of the visuals on this page (and so on). *At least not yet.

Computers were designed as tools to help us with calculations or computations. By this very definition, are computers are inherently better at handling certain types of problems while in others they fail? Well, we have no answer [2] to this question now and I at least hope that it isn’t in affirmative so that someday we can replicate human intelligence. As we have seen in the past, we certainly can not say that “X” is something that computers will never be able to do. But we can sure point out the areas in which the researchers are working hard and hoping to improve.

Here’s a video that talks about the topic that I am hinting at. While I promise not to post many TED talks in future, you can be sure of finding this central idea (the first half of the talk) as a common theme on this blog. Also, I prefer the word “Collaboration” over “Cooperation” [3] :

TLDR Let’s not try to solve big problems solely with computers. Make computers do the boring repetitive work and involve humans for providing creative inputs or heuristics for the machines. Try to improve interfaces that make this possible.

Although this was an idea envisioned in "Man-Computer Symbiosis" (Licklider J. C. R., 1960) more than half-a-century ago, researchers seem to have not given due importance to it when [4] the computers failed to perform as well as expected. Of course, more the number of “X”s that the computers are able to do by themselves, the more it frees us to do whatever we do best. When we do look around and observe the devices that we use and how we interact with the machines everyday, we seem to have knowingly or unknowingly progressed in the direction shown by Licklider. With the furthering of research in areas such as Human Computing, Social Computing, and (the new buzzword) Crowd-sourcing, the interest shown in such ideas has never been greater.

References

  1. Licklider J. C. R. (1960), Man-Computer Symbiosis. IEEE. Available: http://groups.csail.mit.edu/medg/people/psz/Licklider.html.

Footnotes

  1. More about Watson from IBM here. See also, Jeopardy vs. Chess. ^
  2. Amazon’s Mechanical Turk does talk about “HITs” or Human Intelligence Tasks ^
  3. In AI terms, it would indeed be multi-agent co-operation but then again we are not treating humans just as agents in this case. ^
  4. AI Winter: http://en.wikipedia.org/wiki/AI_winter ^

Computers and Chess

Deep Blue vs Kasparov '96 Game 1
Deep Blue vs. Kasparov: 1996 Game 1. Deep Blue won this game but Kasparov went on to win the match by 4-2. In the 1997 re-match, however, Deep Blue won 3½–2½.

To design an algorithm for playing the game of chess has been one of the challenges that has attracted the attention of many mathematicians and computer scientists. The sheer number of combinatorial possibilities make it hard to predict the result for both humans and computers alike. There have been many highly publicized games pitting humans against the (super) computers in the ’90s and ’00s, such as the Deep Blue vs. Kasparov one.

It was around the same time that I was starting out with chess and was interested in learning how to play better. My father had gifted me a copy of a computer game called Maurice Ashley Teaches Chess. It included playing strategies, past-game analysis and video coaching by the chess grandmaster Maurice Ashley. It also had a practice mode where you could compete and play against the computer. I didn’t end up being a good chess player but if my memory serves me right, it did not take me long to start beating the in-game AI. But things have changed a lot since then. Computers are not only faster and more powerful now (to explore more number of moves) but are also equipped with better algorithms to evaluate a decision. Let’s compare excerpts from the introductory chapters from two of my textbooks:

From "Cognitive Psychology" (Medin et.al., 2004):

The number of ways in which the first 10 moves can be played is on the order of billions and there are more possible sequences for the game than there are atoms in the universe! Obviously neither humans nor machines can determine the best moves by considering all the possibilities. In fact, grandmaster chess players typically report that they consider only a handful of the possible moves and “look ahead” for only a few moves. In contrast, chess computers are capable of examining more than 2,000,000 potential moves per second and can search quite a few moves ahead. The amazing thing is that the best grandmasters (as of this writing) are still competitive with the best computers.

Now consider, "Artificial Intelligence: A Modern Approach (3rd Edition)" (Russell et.al., 2010):

IBM’s DEEP BLUE became the first computer program to defeat the world champion in a chess match when it bested Garry Kasparov by a score of 3.5 to 2.5 in an exhibition match (Goodman and Keene, 1997). Kasparov said that he felt a “new kind of intelligence” across the board from him. Newsweek magazine described the match as “The brain’s last stand.” The value of IBM’s stock increased by $18 billion. Human champions studied Kasparov’s loss and were able to draw a few matches in subsequent years, but the most recent human-computer matches have been won convincingly by the computer.

So, what happened in the six year gap between the publishing of these books? It turns out that there has indeed been such a shift in the recent years. The computers’ superior performance stats can be seen on this Wikipedia entry. We have come a long way since the Kasparov vs. Deep Blue matches due the the advancements in both hardware and AI algorithms. Computers have now started not only wining but dominating in the human-computer chess matches so much so that even mobile phones running slower hardware are reaching Grandmaster levels. Guess, time’s right for switching to new board games! Btw, Checkers is a solved problem since 2007: http://www.sciencemag.org/content/317/5844/1518.full! It will end up in a draw (they have a computational proof of that) if both players use the perfect strategies, i.e. the one that never loses.

Image Credits: en:User:Cburnett / Wikimedia Commons / CC-BY-SA-3.0 / GFDL

References

  1. Russell et.al. (2010), Artificial Intelligence: A Modern Approach (3rd Edition), 49. Prentice Hall. Available: http://www.amazon.com/Artificial-Intelligence-Modern-Approach-Edition/dp/0136042597.
  2. Medin et.al. (2004), Cognitive Psychology, 8. Wiley. Available: http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/0471458201.