Clinical Text Processing with Python

We are seeing a rise of Artificial Intelligence in medicine. This has potential for remarkable improvements in diagnosis, prevention and treatment in healthcare. Many of the existing applications are about rapid image interpretation using AI. We have many open opportunities in leveraging NLP for improving both clinical workflows and patient outcomes.

Python has become the language of choice for Natural Language Processing (NLP) in both research and development: from old school NLTK to PyTorch for building state-of-the-art deep learning models. Libraries such as Gensim and spaCy have also enabled production-ready NLP applications. More recently, Hugging Face has built a business around rapidly making current NLP research quickly accessible.

Yesterday, I presented on processing clinical text using Python at the local Python User Group meeting.

During the talk I discussed some opportunities in clinical NLP, mapped out fundamental NLP tasks, and toured the available programming resources– Python libraries and frameworks. Many of these libraries make it extremely easy to leverage state-of-the-art NLP research for building models on clinical text. Towards end of the talk, I also shared some data resources to explore and start hacking on.

It was a fun experience overall and I received some thoughtful comments and feedback — both during the talk and later also online. Special thanks to Pete Fein for organizing the meetup. It was probably the first time I had so many people put on a waitlist for attending one of my presentations. I am also sharing my slides from the talk in hope that they can be useful…

DermaQ Treatment Assistant

I participated in BlueHack this weekend – a hackathon hosted by IBM and AmerisourceBergen. I got a chance to work with an amazing team (Xiaoxiao, Charmgil, Hedy, Siyang and Michael) —  the best kind of team-members you could find at a hackathon. We were mentored by veterans like Nick Adkins (the leader of the PinkSocks tribe!), whose extensive experience was super-handy during the ideation stage of our project.

Our first team-member, Xiaoxiao Li, is a Dermatology resident who came to the hackathon with ideas for a dermatology treatment app. She explained how most dermatology patients come from a younger age-group and are technologically savvy enough to be targeted with app-based treatment plans. We bounced some initial ideas with the team and narrowed down on a treatment companion app for the hackathon.

We picked ‘acne’ as an initial problem to focus on. We were surprised by the billions of dollars that are spent on acne treatments every year. We researched the main problem in failed treatments to be patient non-compliance. This happens when the patients don’t understand the treatment instructions completely, are worried about prescription side-effects, or are just too busy and miss doses. Michael James designed super cool mockups to address these issues:

While schedules and reminders could keep the patients on track, we still needed a solution to answer patients’ questions after they have left the doctor’s office. A chat-based interface offered a feasible solution to transform lengthy home-going instructions into something usable, convenient and accessible. It would save calls to the doctor for simpler questions, while also ensuring that patients clearly understand doctor’s instructions. Since this hackathon was hosted by IBM, we thought that it would be prudent to demo a Watson-powered chatbot. Charmgil Hong and I worked on building live demos. Using a fairly shallow dialogue tree, we were able to build a usable demo during the hackathon. A simple extension to this would be an Alexa-like conversational interface, which can be adopted for patient-education in many other scenarios such as post-surgery instructions etc.:

Demo of our conversational interface built using Watson Assistant

Hedy Chen and Siyang Hu developed a neat business plan to go along as well. We would charge a commitment fee from the patients to use our app. If the patients follow all the steps and instructions for the treatment, we return a 100% of their money back. Otherwise, we make money from targeted skin-care advertisements. I believe that such a model could be useful for building other patient compliance apps as well. Here‘s a link to our slides, if you are interested. Overall, I am super happy with all that we could achieve within just one and a half days! And yes, we did get a third prize for this project 🙂 

Announcing NLPReViz…

Update – 5 Nov’18: Our paper was featured in AMIA 2018 Fall Symposium’s Year-in-Review!

NLPReViz -

We have released the source code for our NLPReViz project. Head to to checkout its project page.

Also, here’s our new JAMIA publication on it:

Gaurav Trivedi, Phuong Pham, Wendy W Chapman, Rebecca Hwa, Janyce Wiebe, Harry Hochheiser; NLPReViz: an interactive tool for natural language processing on clinical text. Journal of American Medical Informatics Association. 2017. DOI: 10.1093/jamia/ocx070.

Interactive Natural Language Processing for Legal Text

Update: We received the best student paper award for our paper at JURIX’15!

In an earlier post, I talked about my work on Natural Language Processing in the clinical domain. The main idea behind the project is to enable domain experts to build machine learning models for analyzing text. We do this by designing usable tools for NLP without really having the need to send datasets to machine learning experts or understanding the inner working details of the algorithms. The post also features a demo video of the prototype tool that we have built.

I was presenting this work at my program’s bi-weekly meetings where Jaromir, a fellow ISP graduate student, pointed out that such an approach could be useful for his work as well. Jaromir also holds a degree in Law and works on building AI systems for legal applications. As a result, we ended up collaborating on a project on using the approach for statutory analysis. While, the main topic of discussion in the project is on the framework in which a human experts cooperate with a machine learning text classification algorithm, we also ended up augmenting our approach with a new way of capturing and re-using knowledge. In our tool datasets and models are treated separately and our not tied together. So, if you were building a classification model for say statutes from the state of Alaska, when you need to analyze laws from Kansas you need not start from scratch. This allows us to be in a better starting place in terms of all the performance measures and build a model using fewer training examples.

The results of the cold start (Kansas) and the knowledge re-use (Alaska) experiment. In the Figure KS stands for Kansas, AK for Alaska, 1p and 2p for the first (ML model-oriented) and second (interaction-oriented) evaluation perspectives, P for precision, R for recall, F1 for F1 measure, and ROC with a number for an ROC curve of the ML classifier trained on the specified number of documents.
The results of the cold start (Kansas) and the knowledge re-use (Alaska) experiment. In the Figure KS stands for Kansas, AK for Alaska, P for precision, R for recall, F1 for F1 measure, and ROC with a number for an ROC curve of the ML classifier trained on the specified number of documents.

We will be presenting this work at JURIX’15 during the 28th year of the conference focusing on legal information systems. Previously, we had presented portions of this work at the AMIA Summit on Clinical Research Informatics and at the ACM IUI Workshop on Visual Text Analytics.


JaromĂ­r Ĺ avelka, Gaurav Trivedi, and Kevin Ashley. 2015. Applying an Interactive Machine Learning Approach to Statutory Analysis. In Proceedings of the 28th International Conference on Legal Knowledge and Information Systems (JURIX ’15). Braga, Portugal. [PDF] – Awarded the Best Student Paper (Top 0.01%).

Clinical Text Analysis Using Interactive Natural Language Processing

Update: Here’s our full paper announcement with source-code release…

I am working on a project to support the use of Natural Language Processing in the clinical domain. Modern NLP systems often make use of machine learning techniques. However, physicians and other clinicians, who are interested in analyzing clinical records, may be unfamiliar with these methods. Our project aims to enable such domain experts make use of Natural Language Processing using a point-and-click interface . It combines novel text-visualizations to help its users make sense of NLP results, revise models and understand changes between revisions. It allows them to make any necessary corrections to computed results, thus forming a feedback loop and helping improve the accuracy of the models.

Here’s the walk-through video of the prototype tool that we have built:

At this point we are redesigning some portions of our tool based on feedback from a formative user study with physicians and clinical researchers. Our next step would be to conduct an empirical evaluation of the tool to test our hypotheses about its design goals.

We will be presenting a demo of our tool at the AMIA Summit on Clinical Research Informatics and also at the ACM IUI Workshop on Visual Text Analytics in March.


  1. Gaurav Trivedi. 2015. Clinical Text Analysis Using Interactive Natural Language Processing. In Proceedings of the 20th International Conference on Intelligent User Interfaces Companion (IUI Companion ’15). ACM, New York, NY, USA, 113-116. DOI 10.1145/2732158.2732162 [Presentation] [PDF]
  2. Gaurav Trivedi, Phuong Pham, Wendy Chapman, Rebecca Hwa, Janyce Wiebe, Harry Hochheiser. 2015. An Interactive Tool for Natural Language Processing on Clinical Text. Presented at 4th Workshop on Visual Text Analytics (IUI TextVis 2015), Atlanta. [PDF]
  3. Gaurav Trivedi, Phuong Pham, Wendy Chapman, Rebecca Hwa, Janyce Wiebe, and Harry Hochheiser. 2015. Bridging the Natural Language Processing Gap: An Interactive Clinical Text Review Tool. Poster presented at the 2015 AMIA Summit on Clinical Research Informatics (CRI 2015). San Francisco. March 2015. [Poster][Abstract]