I have a (bad) habit of checking my Twitter feed while at work. Yesterday after my machine learning class, I found my timeline to be filled with Tweets mocking Rahul Gandhi about his first-ever television interview. Naturally, I was curious to know why and I tried to give it a listen. Most of his answers made no sense to me whatsoever! But then guess what? Who else is bad at responding to questions in natural language? The machines are! Maybe it was time to put them to a test and see if the machines could understand Mr. Gandhi. Making use of the transcript made available by the Times of India and some free NLP tools(ets), I spent a couple of hours (unproductive, ofcourse :P) trying to make sense of the interview.
Here’s a wordle summary of his answers, that would at least give you an overview about what was being spoken about during the interview:
Here are some of the most used (best) words from the transcript. The number times they were used are mentioned in parenthesis.
- system (70)
- people (66)
- going (52)
- party (51)
- country (44)
- want (34)
- congress (34)
- power (32)
- political (31)
- issue (26)
Next, I set out to generate a summary of his answers. And lo! to my surprise, it made perfect sense (contrary to what you usually get from a summarizer). This is the summary generated from the online tool at http://freesummarizer.com/:
What I feel is that this country needs to look at the fundamental issues at hand, the fundamental political issue at hand is that our Political system is controlled by too few people and we absolutely have to change the way our political system is structured, we have to change our Political parties, we have to make them more transparent, we have to change the processes that we use to elect candidates, we have to empower women in the political parties, that is where the meat of the issue but I don’t hear that discussion, I don’t hear the discussion about how are we actually choosing that candidate, that is never the discussion.
That ascribes huge power to the Congress party, I think the Congress party’s strength comes when we open up when we bring in new people, that is historically been the case and that is what I want to do.
The Gujarat riots took place frankly because of the way our system is structured, because of the fact that people do not have a voice in the system. And what I want to do. He was CM when Gujarat happened The congress party and the BJP have two completely different philosophies, our attack on the BJP is based on the idea that this country needs to move forward democratically, it needs push democracy deeper into the country, it needs to push democracy into the villagers, it needs to give women democratic powers, it needs to give youngsters democratic powers.
You are talking about India, we have had a 1 hour conversation here, you haven’t asked me 1 question about how we are going to build this country, how we are going to take this country forward, you haven’t asked me one question on how we are going to empower our people, you haven’t asked me one question on what we are going to do for youngsters, you are not interested in that.
There is the Congress Party that believes in openness, that believes in RTI, that believes in Panchayati Raj, that believes in giving people power. The Congress party is an extremely powerful system and all the Congress party needs to do is bring in younger fresher faces in the election which is what we are going to do and we are going to win the election.
In retrospect, repeating a few points several times is a good enough cue for an auto-summarizer to identify important sentences. This interview was perfect for a task like this as Mr. Gandhi repeated the same set of (rote) answers for almost every question that he was asked. Perhaps this is what he was hoping for? To make sure that when lazy journalists use automatic tools to do their jobs, it would give them a perfect output!
Now coming to the interesting bit. If you were a human listener like me and wanted to read the answers that he really did attempt to answer  , what would you do? Fear not! I have built an SVM classifier from this transcript that you could make use of in future. I used LightSide, an open source platform created by CMU LTI researchers to understand features from the transcript of his answers. Let’s get into the details then.
When you go for a interview, you could either choose to answer a question or try to avoid by cleverly diverting from the main question asked. In Rahul’s case, we have answers that can be mainly grouped into three categories – a) the questions that he answered, b) he managed to successfully avoid and c) the LOL category (the answer bears no resemblance to the question asked). I combined categories (b) and (c) to come up with classes: ANSWERED or UNANSWERED. You may check out my list of classes here and read the interview answers from the Times of India article here. They follow the same order as in the transcript with the exception of single line questions-answers that would’ve otherwise served as noise for machine learning. I selected a total of 114 questions in all out which 45 were answered and the remaining 69 were either successfully avoided or belonged to the LOL category  .
For feature extraction, I used quite simple language features like Bigrams, Trigrams, Line length after excluding stop words etc. You can download them in the LightSide feature format. I used the SVM plugin to learning the classification categories from the feature. Here is the final model that the tool built using the extracted features. And the results were surprising (or probably not :). With 10-fold cross validation, the resulting model had an accuracy of over 72%! An accuracy percentage like this is considered to be exceptional (in case you are not familiar with the field). The machines indeed understand Rahul Gandhi!
Unfortunately, I did not have enough data to run a couple of tests separately. We’ll have to probably wait for Mr. Gandhi to give his next interview for that. Hope that the Congress party members work as hard as the NLP researchers so that we can have a good competition by then!