Bhasha: Sanskrit Input Tool

Typing Sanskrit can be challenging if you don’t have access to your special keyboard, or don’t have your favorite input tools installed on the computer you are working on.

So I set out to write a Google Docs add-on that could make it easy to do so. A Google Docs add-on could be an ideal option for typing Sanskrit, even while using guest computers.

Sanskrit has more sounds than English and other languages written in Roman scripts. Devanagari is the commonly used script used for writing Sanskrit and uses non-ASCII characters. Some systems such as IAST and ITRANS use additional symbols to represent Sanskrit sounds in Roman-like scripts. For example, ā in IAST is used to represent the long “a” syllable.

I found this library called Sanscript.js which could convert between these different writing schemes. However it doesn’t address the problem of being able to use a standard keyboard with ASCII characters. IAST includes additional characters such as ū, ṣ, ñ, ṅ, ṃ, etc. It is more readable than other competing schemes such as ITRANS. ITRANS includes a mix of upper-case and lower-case letters, in the middle of a word, to represent additional sounds. This has adverse affect on the aesthetics of the script. I find it harder to read as well. IAST has also been used in academia and Sanskrit books written in the West since a long time. As a result many users of Sanskrit language are familiar with this system. It was later standardized as ISO 15919 with small changes.

I added a new “IAST Simplified” scheme to the list supported by Sanscript.js. This is inspired from my favorite Sanskrit writing software called Sanskrit Writer. This scheme uses standard ASCII characters and closely resembles the IAST scheme. For example, it uses a for ā, ~n for ñ, and h. for , and so on. The following table explains this scheme in detail:

Vowels

a

-a

i

-i

u

-u

r.

-r.

l.

-l.

e

ai

o

au

m.

h.

Consonants

ka

kha

ga

gha

.na

ca

cha

ja

jha

~na

t.a

t.ha

d.a

d.ha

n.a

ta

tha

da

dha

na

pa

pha

ba

bha

ma

‘sa

s.a

sa

ha

 ळ

_la

ya

ra

la

va

Vowel Marks

क्

k

खा

kh-a

गि

gi

घी

dh-i

ङु

.nu

चू

c-u

छृ

chr.

जॄ

j-r.

झॢ

jhl.

ञॣ

~n-l.

टे

t.e

ठै

t.hai

डो

d.o

डौ

d.au

थ्

tha

णं

n.am

तः

tah.

क्ष

ks.a

त्र

tra

ज्ञ

j~na

Symbols

|

।।

||

0

1

2

3

4

5

6

7

8

9

After these additions to Sanscript.js, I was able to write a quick Google Docs plugin that could convert between different schemes with a click of a button. You can try this plugin out by making a copy of my Google document.

I was hoping to have a WYSIWYG design where transliteration could happen as you typed. This required intercepting each edit, and was not possible using the Google Docs API. As a second option, I wrote a plugin for QuillJS, a cool open-source rich-text editor. You can try out this add-on here: https://trivedigaurav.com/exp/bhasha/.

Screenshot of the QuillJs editor with Bhasha plugin.

This may not be mobile friendly. I didn’t spend time test it on the phone since you can easily switch to a Sanskrit keyboard on a phone anyway.

Source code for the QuillJS plugin is on Github here. Please feel free to hack on it!

Clinical Text Processing with Python

We are seeing a rise of Artificial Intelligence in medicine. This has potential for remarkable improvements in diagnosis, prevention and treatment in healthcare. Many of the existing applications are about rapid image interpretation using AI. We have many open opportunities in leveraging NLP for improving both clinical workflows and patient outcomes.

Python has become the language of choice for Natural Language Processing (NLP) in both research and development: from old school NLTK to PyTorch for building state-of-the-art deep learning models. Libraries such as Gensim and spaCy have also enabled production-ready NLP applications. More recently, Hugging Face has built a business around rapidly making current NLP research quickly accessible.

Yesterday, I presented on processing clinical text using Python at the local Python User Group meeting.

During the talk I discussed some opportunities in clinical NLP, mapped out fundamental NLP tasks, and toured the available programming resources– Python libraries and frameworks. Many of these libraries make it extremely easy to leverage state-of-the-art NLP research for building models on clinical text. Towards end of the talk, I also shared some data resources to explore and start hacking on.

It was a fun experience overall and I received some thoughtful comments and feedback — both during the talk and later also online. Special thanks to Pete Fein for organizing the meetup. It was probably the first time I had so many people put on a waitlist for attending one of my presentations. I am also sharing my slides from the talk in hope that they can be useful…


Machines learn to play Tabla, Part – 2

This is a followup on my earlier post on Machines Learn to play Tabla. You may wish it check it out first reading this one…

Three years ago, I published a post on using recurrent neural networks to generate tabla rhythms. Sampling music from machine learned models was not in vogue then. My post received a lot of attention on the web and became very popular. The project had been a proof-of-concept and I have wanted build on it for a long time now.

This weekend, I worked on making it more interactive and I am excited to share these updates with you. Previously, I was using a proprietary software to convert tabla notation to sound. That made it hard to experiment with sampled rhythms and I could share only a handful sounds. Taking inspiration from our friends at Vishwamohini, I am now able to convert bols into rhythm on the fly using MIDI.js.

Let me show off the new javascript synthesizer using a popular Delhi kaida. Hit the ‘play’ button to listen:

Now that you’ve heard the computer play, here’s an example of it being played by a tabla maestro:

Of course, the synthesized outcome is not much of a comparison to the performance by the maestro, but it is not too bad either…

Now to the more exciting part- Since our browsers have learned to play the tabla, we can throw in the char-rnn model that I built in the earlier post.  To do this, I used the RecurrentJS library and combined it with my javascript tabla player:

Feel free to play around with tempo and maximum character-limit for sampling. When you click on ‘generate’,  it will play a new rhythm every time. Hope you’ll enjoy playing with it as much as I did!

The player has a few kinks at this point I am working towards fixing them. You too can contribute to my repository on GitHub.

There are two areas that need major work:

Data: The models that I trained for my earlier post was done using a small amount of training data. I have been on a lookout for better dataset since then. I wrote a few emails, but without much success till now. I am interested in knowing about more datasets I could train my models on.

Modeling: Our model did a very good job of understanding the structure of TaalMala notations. Although character level recurrent neural networks work well, it is still based on very shallow understanding of the rhythmic structures. I have not come across any good approaches for generating true rhythms yet:

I think more data samples covering a range of rhythmic structures would only partially address this problem. Simple rule based approaches seem to outperform machine learned models with very little effort. Vishwamohini.com has some very good rule-based variation generators that you could check out.  They sound better than the ones created by our AI. After all the word for compositions- bandish, literally derived from ‘rules’ in Hindi. But on the other hand, there are only so many handcrafted rules that you can come up with which may lead to generating repetitive sounds.

Contact me if you have some ideas and if you’d like to help out! Hope that I am able to post an update on this sooner than three years this time 😀

Announcing NLPReViz…

Update – 5 Nov’18: Our paper was featured in AMIA 2018 Fall Symposium’s Year-in-Review!

NLPReViz - http://nlpreviz.github.io

We have released the source code for our NLPReViz project. Head to http://nlpreviz.github.io to checkout its project page.

Also, here’s our new JAMIA publication on it:

Gaurav Trivedi, Phuong Pham, Wendy W Chapman, Rebecca Hwa, Janyce Wiebe, Harry Hochheiser; NLPReViz: an interactive tool for natural language processing on clinical text. Journal of American Medical Informatics Association. 2017. DOI: 10.1093/jamia/ocx070.

Interactive Natural Language Processing for Legal Text

Update: We received the best student paper award for our paper at JURIX’15!

In an earlier post, I talked about my work on Natural Language Processing in the clinical domain. The main idea behind the project is to enable domain experts to build machine learning models for analyzing text. We do this by designing usable tools for NLP without really having the need to send datasets to machine learning experts or understanding the inner working details of the algorithms. The post also features a demo video of the prototype tool that we have built.

I was presenting this work at my program’s bi-weekly meetings where Jaromir, a fellow ISP graduate student, pointed out that such an approach could be useful for his work as well. Jaromir also holds a degree in Law and works on building AI systems for legal applications. As a result, we ended up collaborating on a project on using the approach for statutory analysis. While, the main topic of discussion in the project is on the framework in which a human experts cooperate with a machine learning text classification algorithm, we also ended up augmenting our approach with a new way of capturing and re-using knowledge. In our tool datasets and models are treated separately and our not tied together. So, if you were building a classification model for say statutes from the state of Alaska, when you need to analyze laws from Kansas you need not start from scratch. This allows us to be in a better starting place in terms of all the performance measures and build a model using fewer training examples.

The results of the cold start (Kansas) and the knowledge re-use (Alaska) experiment. In the Figure KS stands for Kansas, AK for Alaska, 1p and 2p for the first (ML model-oriented) and second (interaction-oriented) evaluation perspectives, P for precision, R for recall, F1 for F1 measure, and ROC with a number for an ROC curve of the ML classifier trained on the specified number of documents.
The results of the cold start (Kansas) and the knowledge re-use (Alaska) experiment. In the Figure KS stands for Kansas, AK for Alaska, P for precision, R for recall, F1 for F1 measure, and ROC with a number for an ROC curve of the ML classifier trained on the specified number of documents.

We will be presenting this work at JURIX’15 during the 28th year of the conference focusing on legal information systems. Previously, we had presented portions of this work at the AMIA Summit on Clinical Research Informatics and at the ACM IUI Workshop on Visual Text Analytics.

References

Jaromír Šavelka, Gaurav Trivedi, and Kevin Ashley. 2015. Applying an Interactive Machine Learning Approach to Statutory Analysis. In Proceedings of the 28th International Conference on Legal Knowledge and Information Systems (JURIX ’15). Braga, Portugal. [PDF] – Awarded the Best Student Paper (Top 0.01%).

Machines learn to play Tabla

Update: This post now has a Part 2.

If you follow machine learning topics in the news, I am sure by now you would have come across Andrej Karpathy‘s blog post on The Unreasonable Effectiveness of Recurrent Neural Networks.[1] Apart from the post itself, I have found it very fascinating to read about the diverse applications that its readers have found for it. Since then I have spent several hours hacking with different machine learning models to compose tabla rhythms:

Although Tabla does not have a standardized musical notation that is accepted by all, it does have a language based on the bols (literally, verbalize in English) or the sounds of the strokes played on it. These bols may be expressed in written form which when pronounced in Indian languages sound like the drums. For example, the theka for the commonly used 16-beat cycle – Teental is written as follows:

Dha | Dhin | Dhin | Dha | Dha | Dhin | Dhin | Dha 
Dha | Tin  | Tin  | Ta  | Ta  | Dhin | Dhin | Dha

For this task, I made use of Abhijit Patait‘s software – TaalMala, which provides a GUI environment for composing Tabla rhythms in this language. The bols can then be synthesized to produce the sound of the drum. In his software, Abhijit extended the tabla language to make it easier for users to compose tabla rhythms by adding a square brackets after each bol that specify the number of beats within which it must be played. You could also lay more emphasis on a particular bol by adding ‘+’ symbols which increased their intensity when synthesized to sound. Variations of standard bols can be defined as well based on different the hand strokes used:

Dha1 = Na + First Closed then Open Ge

Now that we are armed with this background knowledge, it is easy to see how we may attempt to learn tabla like a language model using Natural Language Processing techniques. Predictive modeling of tabla has been previously explored in "N-gram modeling of tabla sequences using variable-length hidden Markov models for improvisation and composition" (Avinash Sastry, 2011). But, I was not able to get access to the datasets used in the study and had to rely on the compositions that came with the TaalMala software.[2] This is comparatively a much smaller database than what you would otherwise use to train a neural network: It comprises of 207 rhythms with 6,840 bols in all. I trained a char-rnn and sampled some compositions after priming it with different seed text such as “Dha”, “Na” etc. Given below is a minute long composition sampled from my network. We can see that not only the network has learned the TaalMala notation but it has also understood some common phrases used in compositions such as the occurrence of the phrase “TiRa KiTa“, repetitions of “Tun Na” etc.:

Ti [0.50] | Ra | Ki | Te | Dha [0.50] | Ti [0.25] | Ra | Ki
| Ta | Tun [0.50] | Na | Dhin | Na 
| Tun | Na | Tun | Na | Dha | Dhet | Dha | Dhet | Dha | Dha
| Tun | Na | Dha | Tun | Na | Ti | Na | Dha | Ti | Te | Ki |
Ti | Dha [0.50] | Ti [0.25] | Ra | Ki | Te | Dhin [0.50] |
Dhin | Dhin | Dha | Ge | Ne | Dha | Dha | Tun | Na | Ti
[0.25] | Ra | Ki | Ta | Dha [0.50] | Ti [0.25] | Ra | Ki |
Te | Dha [1.00] | Ti | Dha | Ti [0.25] | Ra | Ki | Te | Dha
[0.50] | Dhet | Dhin | Dha | Tun | Na | Ti [0.25] | Ra | Ki
| Ta | Dha [0.50] | Ti [0.25] | Ra | Ki | Te | Ti | Ka | Tra
[0.50] | Ti | Ti | Te | Na [0.50] | Ki [0.50] | Dhin [0.13]
| Ta | Ti [0.25] | Ra | Ki | Te | Tra | Ka | Ti [0.25] | Ra
| Ki | Te | Dhin [0.50] | Na [0.25] | Ti [0.25] | Ra | Ki |
Te | Tra | Ka | Dha [0.34] | Ti [0.25] | Ra | Ki | Ta | Tra
| Ka | Tra [0.50] | Ki [0.50] | Tun [0.50] | Dha [0.50] | Ti
[0.25] | Ra | Ki | Ta | Tra | Ka | Ta | Te | Ti | Ta | Kat |
Ti | Dha | Ge | Na | Dha | Ti [0.25] | Ra | Ki | Te | Dha
[0.50] | Dhin | Dhin | Dhin | Dha | Tun | Na | Ti | Na | Ki
| Ta | Dha [0.50] | Dha | Ti [0.50] | Ra | Ki | Te | Tun
[0.50] | Tra [0.25] | Ti [0.25] | Ra | Ki | Te | Tun | Ka |
Ti [0.25] | Ra | Ki | Te | Dha [0.50] | Ki [0.25] | Ti | Dha
| Ti | Ta | Dha | Ti | Dha [0.50] | Ti | Na | Dha | Ti
[0.25] | Ra | Ki | Te | Dhin [0.50] | Na | Ti [0.25] | Ra |
Ki | Te | Tra | Ka | Dha [0.50] | Ti [0.50] | Ra | Ki | Te |
Tun [0.50] | Na | Ki [0.25] | Te | Dha | Ki | Dha [0.50] |
Ti [0.25] | Ra | Ki | Te | Dha [0.50] | Ti [0.25] | Ra | Ki
| Te | Dha [0.50] | Tun | Ti [0.25] | Ra | Ki | Te | Dhin
[0.50] | Na | Ti [0.25] | Te | Dha | Ki [0.25] | Te | Ki |
Te | Dhin [0.50] | Dhin | Dhin | Dhin | Dha | Dha | Tun | Na
| Na | Na | Ti [0.25] | Ra | Ki | Ta | Ta | Ka | Dhe [0.50]
| Ti [0.25] | Ra | Ki | Te | Ti | Re | Ki | Te | Dha [0.50]
| Ti | Dha | Ge | Na | Dha | Ti [0.25] | Ra | Ki | Te | Ti |
Te | Ti | Te | Ti | Te | Dha [0.50] | Ti [0.25] | Te | Ra |
Ki | Te | Dha [0.50] | Ki | Te | Dha | Ti [0.25]

Here’s a loop that I synthesized by pasting a composition sampled 4 times one after the another:

Of course, I also tried training n-gram models and the smoothing methods using the SRILM toolkit. Adding spaces between letters is a quick hack that can be used to train character level models using existing toolkits. Which one produces better compositions? I can’t tell for now but I am trying to collect more data and hope to add updates to this post as and when I find time to work on it. I am not confident if simple perplexity scores may be enough to judge the differences between two models, specially on the rhythmic quality of the compositions. There are many ways in which one can extend this work. One there is a possibility of training on different kinds of compositions: kaidas, relas, laggis etc., different rhythm cycles and also from different gharanas. All of this would required collecting a bigger composition database:

And then there is a scope for allowing humans to interactively edit compositions at places where AI goes wrong. You could also use the samples generated by it as an infinite source of inspiration.

Finally, here’s a link to the work in progress playlist of the rhythms I have sampled till now.

References

  1. Avinash Sastry (2011), N-gram modeling of tabla sequences using variable-length hidden Markov models for improvisation and composition. Available: https://smartech.gatech.edu/bitstream/handle/1853/42792/sastry_avinash_201112_mast.pdf?sequence=1.

Footnotes

  1. If you encountered a lot of new topics in this post, you may find this post on Understanding natural language using deep neural networks and the series of videos on Deep NN by Quoc Le helpful. ^
  2. On the other hand, Avinash Sastry‘s work uses a more elaborate Humdrum notation for writing tabla compositions but is not as easy to comprehend for tabla players. ^

Clinical Text Analysis Using Interactive Natural Language Processing

Update: Here’s our full paper announcement with source-code release…

I am working on a project to support the use of Natural Language Processing in the clinical domain. Modern NLP systems often make use of machine learning techniques. However, physicians and other clinicians, who are interested in analyzing clinical records, may be unfamiliar with these methods. Our project aims to enable such domain experts make use of Natural Language Processing using a point-and-click interface . It combines novel text-visualizations to help its users make sense of NLP results, revise models and understand changes between revisions. It allows them to make any necessary corrections to computed results, thus forming a feedback loop and helping improve the accuracy of the models.

Here’s the walk-through video of the prototype tool that we have built:

At this point we are redesigning some portions of our tool based on feedback from a formative user study with physicians and clinical researchers. Our next step would be to conduct an empirical evaluation of the tool to test our hypotheses about its design goals.

We will be presenting a demo of our tool at the AMIA Summit on Clinical Research Informatics and also at the ACM IUI Workshop on Visual Text Analytics in March.

References

  1. Gaurav Trivedi. 2015. Clinical Text Analysis Using Interactive Natural Language Processing. In Proceedings of the 20th International Conference on Intelligent User Interfaces Companion (IUI Companion ’15). ACM, New York, NY, USA, 113-116. DOI 10.1145/2732158.2732162 [Presentation] [PDF]
  2. Gaurav Trivedi, Phuong Pham, Wendy Chapman, Rebecca Hwa, Janyce Wiebe, Harry Hochheiser. 2015. An Interactive Tool for Natural Language Processing on Clinical Text. Presented at 4th Workshop on Visual Text Analytics (IUI TextVis 2015), Atlanta. http://vialab.science.uoit.ca/textvis2015/ [PDF]
  3. Gaurav Trivedi, Phuong Pham, Wendy Chapman, Rebecca Hwa, Janyce Wiebe, and Harry Hochheiser. 2015. Bridging the Natural Language Processing Gap: An Interactive Clinical Text Review Tool. Poster presented at the 2015 AMIA Summit on Clinical Research Informatics (CRI 2015). San Francisco. March 2015. [Poster][Abstract]