Clinical Text Processing with Python

We are seeing a rise of Artificial Intelligence in medicine. This has potential for remarkable improvements in diagnosis, prevention and treatment in healthcare. Many of the existing applications are about rapid image interpretation using AI. We have many open opportunities in leveraging NLP for improving both clinical workflows and patient outcomes.

Python has become the language of choice for Natural Language Processing (NLP) in both research and development: from old school NLTK to PyTorch for building state-of-the-art deep learning models. Libraries such as Gensim and spaCy have also enabled production-ready NLP applications. More recently, Hugging Face has built a business around rapidly making current NLP research quickly accessible.

Yesterday, I presented on processing clinical text using Python at the local Python User Group meeting.

During the talk I discussed some opportunities in clinical NLP, mapped out fundamental NLP tasks, and toured the available programming resources– Python libraries and frameworks. Many of these libraries make it extremely easy to leverage state-of-the-art NLP research for building models on clinical text. Towards end of the talk, I also shared some data resources to explore and start hacking on.

It was a fun experience overall and I received some thoughtful comments and feedback — both during the talk and later also online. Special thanks to Pete Fein for organizing the meetup. It was probably the first time I had so many people put on a waitlist for attending one of my presentations. I am also sharing my slides from the talk in hope that they can be useful…


AMIA 2017

I attended my first AMIA meeting last week. It was an exciting experience to meet with close to 2,500 informaticians at once. It was also a bit overwhelming due to the scale of the event as well as being in company of famous researchers whose papers you have read:

Twitter log from the 2017 AMIA Annual Symposium held from Nov 4 – 8 in Washington DC. AMIA brings together informatics researchers, professionals, students, and everyone using informatics in health care… [Click to view]

If you weren’t able to attend the event in person, the good news is that the a lot of informaticians are big into documenting stuff on twitter. Check out my twitter moment here and the hashtag #AMIA2017 for more…

On Clippy and building software assistants

I have been attending a reading group on visualization tools for the last few weeks. This is a unique multi-institution group that meets over web-conferencing at 4 PM EST / 1 PM PST on Fridays. It includes a diverse bunch of participants including non-academic researchers.

Every week we vote on and discuss a range of topics related to building tools for visualizing data.

This week, it was my turn to lead a discussion on the Lumiere paper. This is the research responsible for the now retired Clippy Office assistant. I also noticed a strong ISP presence in the references section as the paper focuses on Bayesian user modeling.

During the discussion, we talked about how we can offer help to use vis tools better. Here are my slides from it:


References

  1. Eric Horvitz, Jack Breese, David Heckerman, David Hovel, and Koos Rommelse. 1998. The lumière project: Bayesian user modeling for inferring the goals and needs of software users. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence (UAI’98), Gregory F. Cooper and Serafín Moral (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 256-265.
  2. Justin Matejka, Wei Li, Tovi Grossman, and George Fitzmaurice. 2009. CommunityCommands: command recommendations for software applications. In Proceedings of the 22nd annual ACM symposium on User interface software and technology. ACM, New York, NY, USA, 193-202.

Quoc Le’s Lectures on Deep Learning

Update:  Dr. Le has posted tutorials on this topic: Part 1 and Part 2.

Dr. Quoc Le from the Google Brain project team (yes, the one that made headlines for creating a cat recognizer) presented a series of lectures at the Machine Learning Summer School (MLSS ’14) in Pittsburgh this week. This is my favorite lecture series from the event till now and I was glad to be able to attend them.

The good news is that the organizers have made available the entire set of video lectures in 4K for you to watch. But since Dr. Le did most of them on a blackboard and did not provide any accompanying slides, I decided to put the brief content descriptions of the lectures along with the videos here. Hope this will help you navigate the videos better.

Lecture 1: Neural Networks Review

[JavaScript needed to view this video.]

Dr. Le begins his lecture starting from the fundamentals on Neural Networks if you’d like to brush up your knowledge about them. Otherwise feel free to quickly skim through the initial sections, but I promise there are interesting things later on. You may use the links below to skip to the relevant parts. The links are using an experimental script, let me know in the comments if they don’t work.

Contents


Lecture 2: NNs in Practice

[JavaScript needed to view this video.]

If you have already covered NN in the past then the first lecture may have been a bit dry for you but the real fun begins in this lecture when Dr. Le starts talking about his experiences of using deep learning in practice.

Contents


Lecture 3: Deep NN Architectures

[JavaScript needed to view this video.]

In this lecture, Dr. Le finishes his description on NN architectures. He also talks a bit about how they are being used at Google – for applications in image and speech recognition, and language modelling.

Contents

Talk: Human-Data Interaction

This week I attended a high energy ISP seminar on Human-Data Interaction by Saman Amirpour. Saman is an ISP graduate student who also works with the CREATE Lab. His work in progress project on the Explorable Visual Analytics tool serves as a good introduction to this post:

While this may have some resemblance with other projects such as the famous Gapminder Foundation led by Hans Rosling, Saman presented a bigger picture in his talk and provided motivation for the emergence of a new field: Human-Data Interaction.

Big data is a term that gets thrown around a lot these days and probably needs no introduction. There are three parts of the big data problem, involving data collection, knowledge discovery and communication. Although we are able to collect massive amounts of data easily, the real challenge lies in using it to our advantage. Unfortunately, we do not enough sophistication in our machine learning algorithms that can handle this as yet. You really can’t do without the human in the loop for making some sense of the data and asking intelligent questions. And as this Wired article points out, visualization is the key for allowing us humans to do this. But, our present-day tools are not well suited for this purpose and it is difficult to handle high dimensional data. We have a tough time to intuitively understand such data. For example, try visualizing a 4D analog of a cube in your head!

So now the relevant question that one could ask is that if Human-data interaction (or HDI) really any different from the long existing areas of visualization and visual analytics? Saman suggests that HDI addresses much more than visualization alone. It involves answering 4 big questions on:

  • Steering To help in navigate the high dimensional space. This is the main area of interest for researchers in the visualization area.

But we also need to solve problems with:

  • Sense-making i.e. how can we help the users to make discoveries from the data. Sometimes, the users may not even start with the right questions in mind!
  • Communication The data experts need a medium to share their models that can in-turn allow others to ask new questions.
  • And finally, all of this needs to be done after solving the Technical challenges in building the interactive systems that support all of this.

Tools that can sufficiently address these challenges are the way to go in future. They can truly help the humans in their sense-making processes by providing them with responsive and interactive methods to not only test and validate their hypotheses but also communicate them.

Saman devoted the rest of the talk to demo some of the tools that he contributed towards and gave some examples of beautiful data visualizations. Most of them were accompanied by a lot of gasping sounds from the audience. He also presented some initial guidelines for building HDI interfaces based on these experiences.

Talk: The Signal Processing Approach to Biomarkers

A biomarker is a measurable indicator of a biological condition. Usually it is seen as a substance or a molecule introduced in the body but even physiological indicators may function as dynamic biomarkers for certain diseases. Dr. Sejdić and his team at the IMED Lab work on finding innovative ways to measure such biomarkers. During the ISP seminar last week, he presented his work on using low-cost devices with simple electronics such as accelerometers and microphones, to capture the unique patterns of physiological variables. It turns out that by analyzing these patterns, one can differentiate between healthy and pathological conditions. Building these devices requires an interdisciplinary investigation and insights from signal processing, biomedical engineering and also machine learning.

Listening to the talk, I felt that Dr. Sejdić is a researcher who is truly an engineer at heart as he described his work on building an Asperometer. It is a device that is placed on the throat of a patient to find out when they have swallowing difficulties (Dysphagia). The device picks up the vibrations from the throat and does a bit of signal processing magic to identify problematic scenarios. Do you remember the little flap called the Epiglotis that guards the entrance to your wind pipe, from your high school Biology? Well, that thing is responsible for directing the food into the oesophagus (food pipe) while eating and preventing it from going into wrong places (like the lungs!). As it moves to cover the wind pipe, it records a characteristic motion pattern on the accelerometer. The Asperometer can then distinguish between regular and irregular patterns to find out when should we be concerned. The current gold standard to do these assessments involve using some ‘Barium food’ and X-Rays to visualize its movement. As you may have realized, the Asperometer is not only unobstrusive but also appears to be a safer method to do so. There are a couple of issues left to iron out though, such as removing sources of noise in the signal due to speech or even breathing through the mouth. We can, however, still use it in controlled usage scenarios in the presence of a specialist.

The remaining part of the talk briefly dealt with Dr. Sejdić’s investigations of gait, handwriting processes and preference detection, again with the help of signal processing and some simple electronics on the body. He is building on work in biomedical engineering to study age and disease related changes in our bodies. The goal is to explore simple instruments providing useful information that can ultimately help to prevent, retard or reverse such diseases.