Bhasha: Sanskrit Input Tool

Typing Sanskrit can be challenging if you don’t have access to your special keyboard, or don’t have your favorite input tools installed on the computer you are working on.

So I set out to write a Google Docs add-on that could make it easy to do so. A Google Docs add-on could be an ideal option for typing Sanskrit, even while using guest computers.

Sanskrit has more sounds than English and other languages written in Roman scripts. Devanagari is the commonly used script used for writing Sanskrit and uses non-ASCII characters. Some systems such as IAST and ITRANS use additional symbols to represent Sanskrit sounds in Roman-like scripts. For example, ā in IAST is used to represent the long “a” syllable.

I found this library called Sanscript.js which could convert between these different writing schemes. However it doesn’t address the problem of being able to use a standard keyboard with ASCII characters. IAST includes additional characters such as ū, ṣ, ñ, ṅ, ṃ, etc. It is more readable than other competing schemes such as ITRANS. ITRANS includes a mix of upper-case and lower-case letters, in the middle of a word, to represent additional sounds. This has adverse affect on the aesthetics of the script. I find it harder to read as well. IAST has also been used in academia and Sanskrit books written in the West since a long time. As a result many users of Sanskrit language are familiar with this system. It was later standardized as ISO 15919 with small changes.

I added a new “IAST Simplified” scheme to the list supported by Sanscript.js. This is inspired from my favorite Sanskrit writing software called Sanskrit Writer. This scheme uses standard ASCII characters and closely resembles the IAST scheme. For example, it uses a for ā, ~n for ñ, and h. for , and so on. The following table explains this scheme in detail:

Vowels

a

-a

i

-i

u

-u

r.

-r.

l.

-l.

e

ai

o

au

m.

h.

Consonants

ka

kha

ga

gha

.na

ca

cha

ja

jha

~na

t.a

t.ha

d.a

d.ha

n.a

ta

tha

da

dha

na

pa

pha

ba

bha

ma

‘sa

s.a

sa

ha

 ळ

_la

ya

ra

la

va

Vowel Marks

क्

k

खा

kh-a

गि

gi

घी

dh-i

ङु

.nu

चू

c-u

छृ

chr.

जॄ

j-r.

झॢ

jhl.

ञॣ

~n-l.

टे

t.e

ठै

t.hai

डो

d.o

डौ

d.au

थ्

tha

णं

n.am

तः

tah.

क्ष

ks.a

त्र

tra

ज्ञ

j~na

Symbols

|

।।

||

0

1

2

3

4

5

6

7

8

9

After these additions to Sanscript.js, I was able to write a quick Google Docs plugin that could convert between different schemes with a click of a button. You can try this plugin out by making a copy of my Google document.

I was hoping to have a WYSIWYG design where transliteration could happen as you typed. This required intercepting each edit, and was not possible using the Google Docs API. As a second option, I wrote a plugin for QuillJS, a cool open-source rich-text editor. You can try out this add-on here: https://trivedigaurav.com/exp/bhasha/.

Screenshot of the QuillJs editor with Bhasha plugin.

This may not be mobile friendly. I didn’t spend time test it on the phone since you can easily switch to a Sanskrit keyboard on a phone anyway.

Source code for the QuillJS plugin is on Github here. Please feel free to hack on it!

Announcing NLPReViz…

Update – 5 Nov’18: Our paper was featured in AMIA 2018 Fall Symposium’s Year-in-Review!

NLPReViz - http://nlpreviz.github.io

We have released the source code for our NLPReViz project. Head to http://nlpreviz.github.io to checkout its project page.

Also, here’s our new JAMIA publication on it:

Gaurav Trivedi, Phuong Pham, Wendy W Chapman, Rebecca Hwa, Janyce Wiebe, Harry Hochheiser; NLPReViz: an interactive tool for natural language processing on clinical text. Journal of American Medical Informatics Association. 2017. DOI: 10.1093/jamia/ocx070.