{"id":1703,"date":"2015-05-29T00:10:46","date_gmt":"2015-05-29T00:10:46","guid":{"rendered":"http:\/\/www.trivedigaurav.com\/blog\/?p=1703"},"modified":"2020-09-27T18:01:34","modified_gmt":"2020-09-27T18:01:34","slug":"machines-learn-to-play-tabla","status":"publish","type":"post","link":"https:\/\/www.trivedigaurav.com\/blog\/machines-learn-to-play-tabla\/","title":{"rendered":"Machines learn to play Tabla"},"content":{"rendered":"<blockquote><p>Update: This post now has a <a href=\"http:\/\/www.trivedigaurav.com\/blog\/machines-learn-to-play-tabla-part-2\/\">Part 2<\/a>.<\/p><\/blockquote>\n<p>If you follow machine learning topics in the news, I am sure by now you would have come across <a href=\"http:\/\/cs.stanford.edu\/people\/karpathy\/\">Andrej Karpathy<\/a>&#8216;s blog post on <a href=\"http:\/\/karpathy.github.io\/2015\/05\/21\/rnn-effectiveness\/\">The Unreasonable Effectiveness of Recurrent Neural Networks<\/a>.<a href=\"#note9efcf3609dc07e3a3bf04ac9c55e3779\" name=\"9efcf3609dc07e3a3bf04ac9c55e3779\" title=\"If you encountered a lot of new topics in this post, you may find this post on Understanding natural language using deep neural networks and the series of videos on Deep NN by Quoc Le helpful.\" style=\"text-decoration:none\"><sup>[1]<\/sup><\/a>  Apart from the post itself, I have found it very fascinating to read about the diverse applications that its readers have found for it. Since then I have spent several hours hacking with different machine learning models to compose <a href=\"http:\/\/en.wikipedia.org\/wiki\/Tabla\">tabla<\/a> rhythms:<\/p>\n<blockquote class=\"twitter-tweet\" lang=\"en\">\n<p dir=\"ltr\" lang=\"en\">Inspired by <a href=\"https:\/\/twitter.com\/seaandsailor\">@seaandsailor<\/a>, used <a href=\"https:\/\/twitter.com\/karpathy\">@karpathy<\/a>&#8216;s char-rnn to make a tabla rhythm <a href=\"https:\/\/t.co\/kqzZG3q2A2\">https:\/\/t.co\/kqzZG3q2A2<\/a> Amazed how well it learnt on small data<\/p>\n<p>\u2014 Gaurav Trivedi (@trivedigaurav) <a href=\"https:\/\/twitter.com\/trivedigaurav\/status\/603024240686817281\">May 26, 2015<\/a><\/p><\/blockquote>\n<p><script src=\"\/\/platform.twitter.com\/widgets.js\" async=\"\" charset=\"utf-8\"><\/script>Although Tabla does not have a standardized musical notation that is accepted by all, it does have a language based on the <em>bols (<\/em>literally<em>, verbalize<\/em> in English) or the sounds of the strokes played on it. These <em>bols<\/em> may be expressed in written form which when pronounced in Indian languages <a href=\"https:\/\/www.merriam-webster.com\/dictionary\/onomatopoeia\">sound like the drums<\/a>. For example, the <em>theka <\/em>for the commonly used 16-beat cycle &#8211; <em>Teental<\/em> is written as follows:<\/p>\n<pre>Dha | Dhin | Dhin | Dha | Dha | Dhin | Dhin | Dha \nDha | Tin&nbsp; | Tin&nbsp; | Ta  | Ta&nbsp; | Dhin | Dhin | Dha\n<\/pre>\n<p>For this task, I made use of <a href=\"https:\/\/www.linkedin.com\/in\/patait\">Abhijit Patait<\/a>&#8216;s software &#8211; <a href=\"http:\/\/taalmala.com\/\">TaalMala<\/a>, which provides a GUI environment for composing Tabla rhythms in this language. The <em>bols <\/em>can then be synthesized to produce the sound of the drum. In his software, Abhijit <a href=\"http:\/\/www.taalmala.com\/help.shtml\">extended<\/a> the tabla language to make it easier for users to compose tabla rhythms by adding a square brackets after each bol that specify the number of beats within which it must be played. You could also lay more emphasis on a particular <em>bol<\/em> by adding &#8216;+&#8217; symbols which increased their intensity when synthesized to sound. Variations of standard <em>bols<\/em> can be defined as well based on different the hand strokes used<em>:<\/em><\/p>\n<pre>Dha1 = Na + First Closed then Open Ge<\/pre>\n<p>Now that we are armed with this background knowledge, it is easy to see how we may attempt to learn tabla like a language model using Natural Language Processing techniques. Predictive modeling of tabla has been previously explored in <em>\"N-gram modeling of tabla sequences using variable-length hidden Markov models for improvisation and composition\"<\/em> (Avinash Sastry, 2011). But, I was not able to get access to the datasets used in the study and had to rely on the compositions that came with the TaalMala software.<a href=\"#notef871d50fa24d0467364724f7aa7f8e0b\" name=\"f871d50fa24d0467364724f7aa7f8e0b\" title=\"On the other hand, Avinash Sastry&#8216;s work uses a more elaborate Humdrum notation for writing tabla compositions but is not as easy to comprehend for tabla players.\" style=\"text-decoration:none\"><sup>[2]<\/sup><\/a>  This is comparatively a much smaller database than what you would otherwise use to train a neural network: It comprises of 207 rhythms with 6,840 <em>bols<\/em> in all. I trained a char-rnn and sampled some compositions after priming it with different seed text such as &#8220;Dha&#8221;, &#8220;Na&#8221; etc. Given below is a minute long composition sampled from my network. We can see that not only the network has learned the TaalMala notation but it has also understood some common phrases used in compositions such as the occurrence of the phrase &#8220;<em>TiRa KiTa<\/em>&#8220;, repetitions of <em>&#8220;Tun Na&#8221;<\/em> etc.:<\/p>\n<pre style=\"overflow: scroll; height: 150px;\">Ti [0.50] | Ra | Ki | Te | Dha [0.50] | Ti [0.25] | Ra | Ki\n| Ta | Tun [0.50] | Na | Dhin | Na \n| Tun | Na | Tun | Na | Dha | Dhet | Dha | Dhet | Dha | Dha\n| Tun | Na | Dha | Tun | Na | Ti | Na | Dha | Ti | Te | Ki |\nTi | Dha [0.50] | Ti [0.25] | Ra | Ki | Te | Dhin [0.50] |\nDhin | Dhin | Dha | Ge | Ne | Dha | Dha | Tun | Na | Ti\n[0.25] | Ra | Ki | Ta | Dha [0.50] | Ti [0.25] | Ra | Ki |\nTe | Dha [1.00] | Ti | Dha | Ti [0.25] | Ra | Ki | Te | Dha\n[0.50] | Dhet | Dhin | Dha | Tun | Na | Ti [0.25] | Ra | Ki\n| Ta | Dha [0.50] | Ti [0.25] | Ra | Ki | Te | Ti | Ka | Tra\n[0.50] | Ti | Ti | Te | Na [0.50] | Ki [0.50] | Dhin [0.13]\n| Ta | Ti [0.25] | Ra | Ki | Te | Tra | Ka | Ti [0.25] | Ra\n| Ki | Te | Dhin [0.50] | Na [0.25] | Ti [0.25] | Ra | Ki |\nTe | Tra | Ka | Dha [0.34] | Ti [0.25] | Ra | Ki | Ta | Tra\n| Ka | Tra [0.50] | Ki [0.50] | Tun [0.50] | Dha [0.50] | Ti\n[0.25] | Ra | Ki | Ta | Tra | Ka | Ta | Te | Ti | Ta | Kat |\nTi | Dha | Ge | Na | Dha | Ti [0.25] | Ra | Ki | Te | Dha\n[0.50] | Dhin | Dhin | Dhin | Dha | Tun | Na | Ti | Na | Ki\n| Ta | Dha [0.50] | Dha | Ti [0.50] | Ra | Ki | Te | Tun\n[0.50] | Tra [0.25] | Ti [0.25] | Ra | Ki | Te | Tun | Ka |\nTi [0.25] | Ra | Ki | Te | Dha [0.50] | Ki [0.25] | Ti | Dha\n| Ti | Ta | Dha | Ti | Dha [0.50] | Ti | Na | Dha | Ti\n[0.25] | Ra | Ki | Te | Dhin [0.50] | Na | Ti [0.25] | Ra |\nKi | Te | Tra | Ka | Dha [0.50] | Ti [0.50] | Ra | Ki | Te |\nTun [0.50] | Na | Ki [0.25] | Te | Dha | Ki | Dha [0.50] |\nTi [0.25] | Ra | Ki | Te | Dha [0.50] | Ti [0.25] | Ra | Ki\n| Te | Dha [0.50] | Tun | Ti [0.25] | Ra | Ki | Te | Dhin\n[0.50] | Na | Ti [0.25] | Te | Dha | Ki [0.25] | Te | Ki |\nTe | Dhin [0.50] | Dhin | Dhin | Dhin | Dha | Dha | Tun | Na\n| Na | Na | Ti [0.25] | Ra | Ki | Ta | Ta | Ka | Dhe [0.50]\n| Ti [0.25] | Ra | Ki | Te | Ti | Re | Ki | Te | Dha [0.50]\n| Ti | Dha | Ge | Na | Dha | Ti [0.25] | Ra | Ki | Te | Ti |\nTe | Ti | Te | Ti | Te | Dha [0.50] | Ti [0.25] | Te | Ra |\nKi | Te | Dha [0.50] | Ki | Te | Dha | Ti [0.25]<\/pre>\n<p><iframe loading=\"lazy\" src=\"https:\/\/w.soundcloud.com\/player\/?url=https%3A\/\/api.soundcloud.com\/tracks\/207733679&amp;color=ff5500&amp;auto_play=false&amp;hide_related=false&amp;show_comments=true&amp;show_user=true&amp;show_reposts=false\" scrolling=\"no\" width=\"100%\" height=\"166\" frameborder=\"no\"><\/iframe><\/p>\n<p>Here&#8217;s a loop that I synthesized by pasting a composition sampled 4 times one after the another:<\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/w.soundcloud.com\/player\/?url=https%3A\/\/api.soundcloud.com\/tracks\/207239283&amp;color=ff5500&amp;auto_play=false&amp;hide_related=false&amp;show_comments=true&amp;show_user=true&amp;show_reposts=false\" scrolling=\"no\" width=\"100%\" height=\"166\" frameborder=\"no\"><\/iframe><\/p>\n<p>Of course, I also tried training <em>n<\/em>-gram models and the smoothing methods using the <a href=\"http:\/\/www.speech.sri.com\/projects\/srilm\/\">SRILM toolkit<\/a>. Adding spaces between letters is a quick hack that can be used to train character level models using existing toolkits. Which one produces better compositions? I can&#8217;t tell for now but I am trying to collect more data and hope to add updates to this post as and when I find time to work on it. I am not confident if simple perplexity scores may be enough to judge the differences between two models, specially on the rhythmic quality of the compositions. There are many ways in which one can extend this work. One there is a possibility of training on different kinds of compositions: <em>kaidas, relas, laggis<\/em> etc., different rhythm cycles and also from different <em>gharanas<\/em>. All of this would required collecting a bigger composition database:<\/p>\n<blockquote class=\"twitter-tweet\" lang=\"en\"><p>If you have access to any good tabla compositions database(s) please do let me know. Thanks! \u2014 Gaurav Trivedi (@trivedigaurav) <a href=\"https:\/\/twitter.com\/trivedigaurav\/status\/603046221087911936\">May 26, 2015<\/a>&nbsp;<\/p><\/blockquote>\n<p><script src=\"\/\/platform.twitter.com\/widgets.js\" async=\"\" charset=\"utf-8\"><\/script><\/p>\n<p>And then there is a scope for allowing humans to interactively edit compositions at places where AI goes wrong. You could also use the samples generated by it as an infinite source of inspiration.<\/p>\n<p>Finally, <a href=\"https:\/\/soundcloud.com\/trivedigaurav\/sets\/machine-learned\">here&#8217;s a link<\/a> to the work in progress playlist of the rhythms I have sampled till now.<\/p>\n<p><h2>References<\/h2><ol><li>Avinash Sastry (2011), <em>N-gram modeling of tabla sequences using variable-length hidden Markov models for improvisation and composition<\/em>. Available: <a href=\"https:\/\/smartech.gatech.edu\/bitstream\/handle\/1853\/42792\/sastry_avinash_201112_mast.pdf?sequence=1\">https:\/\/smartech.gatech.edu\/bitstream\/handle\/1853\/42792\/sastry_avinash_201112_mast.pdf?sequence=1<\/a>.<\/li><\/ol><h2>Footnotes<\/h2><ol><li><a name=\"note9efcf3609dc07e3a3bf04ac9c55e3779\"><\/a> If you encountered a lot of new topics in this post, you may find this post on <a href=\"http:\/\/devblogs.nvidia.com\/parallelforall\/understanding-natural-language-deep-neural-networks-using-torch\/\">Understanding natural language using deep neural networks<\/a> and the <a href=\"http:\/\/www.trivedigaurav.com\/blog\/quoc-les-lectures-on-deep-learning\/\">series of videos on Deep NN by Quoc Le<\/a> helpful. <a href=\"#9efcf3609dc07e3a3bf04ac9c55e3779\" style=\"text-decoration:none;font-weight:bold\">^<\/a><\/li><li><a name=\"notef871d50fa24d0467364724f7aa7f8e0b\"><\/a> On the other hand, <a href=\"https:\/\/www.linkedin.com\/pub\/avinash-sastry\/\">Avinash Sastry<\/a>&#8216;s work uses a more elaborate Humdrum notation for writing tabla compositions but is not as easy to comprehend for tabla players. <a href=\"#f871d50fa24d0467364724f7aa7f8e0b\" style=\"text-decoration:none;font-weight:bold\">^<\/a><\/li><\/ol><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Update: This post now has a Part 2. If you follow machine learning topics in the news, I am sure by now you would have come across Andrej Karpathy&#8216;s blog post on The Unreasonable Effectiveness of Recurrent Neural Networks. Apart from the post itself, I have found it very fascinating to read about the diverse &hellip; <a href=\"https:\/\/www.trivedigaurav.com\/blog\/machines-learn-to-play-tabla\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Machines learn to play Tabla<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[6,21,13,47],"tags":[],"class_list":["post-1703","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence","category-fun","category-machine-learning","category-natural-language-processing"],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p46eol-rt","jetpack-related-posts":[],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.trivedigaurav.com\/blog\/wp-json\/wp\/v2\/posts\/1703","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.trivedigaurav.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.trivedigaurav.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.trivedigaurav.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.trivedigaurav.com\/blog\/wp-json\/wp\/v2\/comments?post=1703"}],"version-history":[{"count":40,"href":"https:\/\/www.trivedigaurav.com\/blog\/wp-json\/wp\/v2\/posts\/1703\/revisions"}],"predecessor-version":[{"id":2942,"href":"https:\/\/www.trivedigaurav.com\/blog\/wp-json\/wp\/v2\/posts\/1703\/revisions\/2942"}],"wp:attachment":[{"href":"https:\/\/www.trivedigaurav.com\/blog\/wp-json\/wp\/v2\/media?parent=1703"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.trivedigaurav.com\/blog\/wp-json\/wp\/v2\/categories?post=1703"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.trivedigaurav.com\/blog\/wp-json\/wp\/v2\/tags?post=1703"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}