Learning from multiple annotators

I recently prepared a deck of slides for my machine learning course. In the presentation, I talk about some of the recently proposed methods on learning from multiple annotators. In these methods we do not assume the labels that we get from the annotators to be the ground truth, as we do in traditional machine learning, but try to find “truth” from noisy data.

There are two main directions of work in this area. One focuses on finding the consensus labels first and then do traditional learning, while the other approach is to learn a consensus model directly. In the second approach, we may estimate the consensus labels during the process of building a classifier itself.

Here are the slides for the presentation. I would be happy to receive your comments and suggestions.


Towards Making Kivy Apps Accessible

After last week‘s discussion with tshirtman about making making Kivy apps more accessible, I spent this week exploring the accessibility options on the devices I own. The idea was to investigate and plan a set of common features that we could implement for Kivy.

I ended up playing with all the settings I that could find under the accessibility menus on an Android phone and a Macbook. I will soon repeat this exercise with an iPhone as well.

In comparison to my Galaxy running Android 4.4.2, these features were easier to use and more responsive to interact with on my Mac 10.9. Even the tutorials were simpler to follow and avoided statements like:

“When this feature is on, the response time could be slowed down in Phone, Calculator and other apps.”

I will keep my rantings aside for now and would instead like to focus on how we could support these features within Kivy apps. Following Apple’s way of organizing these features, you can group accessibility features into three broad categories:

Here's TalkBack on Android reading out my PIN to me!
TalkBack on Android reading out my PIN to me!
  • Seeing: This includes display settings such as colors and contrast, making text larger and zooming the screen. A lot of these settings come free of cost to the app developers. However, an important feature in this category includes the screen reader app (Example: VoiceOver and TalkBack). This is the main area of interest to us as the onus of making this feature work correctly rests with us and the app developers. I will get back to this after we finish classifying the features.
  • Hearing: We must make sure all our alert widgets in Kivy are accompanied by visual cues (such as a screen flash) along with audio indicators. Another feature that might be of interest to us would be to ensure that we provide a subtitles option along in our video player widget.
  • Interacting: It includes features like the assistant menu on Android to provide easy access to a limited number of commonly accessed functions. Most of these features are again managed and controlled by the operating systems themselves. Apart from a few design choices to make sure that app doesn’t interfere with their functions, there’s nothing that the app developers have to do to support them.

So the most important missing piece in the Kivy framework is to provide support to the screen-reading apps. These apps are designed to provide spoken feedback to what the users select (touch or point) and activate. For example if I hover over a menu-item or select a button on the screen, the app must describe what it is and also say what is does, if possible.

While settings such as zoom, larger font-sizes etc. are already taken care of by the frameworks that Kivy builds upon, we must provide explicit support for the screen readers. Here’s an example of an hello world kivy app and how it is seen by the screen-readers. There is nothing apart from the main windows buttons, i.e. “Close”, “Minimize” and “Zoom” that is available to the VoiceOver program to describe:

There is no meta information available to the screen-readers to describe the interface to the user.

The main challenge here is that none of the Kivy widgets are selectable at this point. Not only that makes it difficult to use them with screen-readers but it is also not possible to navigate them using a keyboard. We must provide a screen-reader support along with our feature on providing focus support for Kivy widgets.

Along with on_focus we must send a descriptive feedback that the screen-readers as a description about the widget. While most of the native OS UI elements have such content descriptors already defined, our kivy widgets lack such information. We must make an effort to make sure that common widgets such as text-inputs, select options etc. have a description consistent with the one natively provided by the platform.

So a major portion of this implementation must be dealt within Kivy itself. While we have previously adopted an approach for implementing the lowest common set of features in other Plyer facades, it would be a good idea to implement the content descriptors (or accessibility attributes) in a way that it covers all our target platforms. Plyer could then take on from there to make the the platform specific calls to support the native screen reading apps. We would need to provide a mechanism to transform these attributes into a form relevant on these platforms separately. We could safely ignore the extra set of attributes at this point.

In this post I have only outlined the first steps that we need to take for making Kivy apps more accessible. I will work on figuring out the finer details about implementing them in the coming weeks. Hopefully this would help in triggering off a meaningful discussion within the community and we can have a chance to listen to other opinions on this as well.

 

This is a two part post. You can continue reading on the topic here — Towards Making Kivy Apps Accessible – 2.

Talk: Human-Data Interaction

This week I attended a high energy ISP seminar on Human-Data Interaction by Saman Amirpour. Saman is an ISP graduate student who also works with the CREATE Lab. His work in progress project on the Explorable Visual Analytics tool serves as a good introduction to this post:

While this may have some resemblance with other projects such as the famous Gapminder Foundation led by Hans Rosling, Saman presented a bigger picture in his talk and provided motivation for the emergence of a new field: Human-Data Interaction.

Big data is a term that gets thrown around a lot these days and probably needs no introduction. There are three parts of the big data problem, involving data collection, knowledge discovery and communication. Although we are able to collect massive amounts of data easily, the real challenge lies in using it to our advantage. Unfortunately, we do not enough sophistication in our machine learning algorithms that can handle this as yet. You really can’t do without the human in the loop for making some sense of the data and asking intelligent questions. And as this Wired article points out, visualization is the key for allowing us humans to do this. But, our present-day tools are not well suited for this purpose and it is difficult to handle high dimensional data. We have a tough time to intuitively understand such data. For example, try visualizing a 4D analog of a cube in your head!

So now the relevant question that one could ask is that if Human-data interaction (or HDI) really any different from the long existing areas of visualization and visual analytics? Saman suggests that HDI addresses much more than visualization alone. It involves answering 4 big questions on:

  • Steering To help in navigate the high dimensional space. This is the main area of interest for researchers in the visualization area.

But we also need to solve problems with:

  • Sense-making i.e. how can we help the users to make discoveries from the data. Sometimes, the users may not even start with the right questions in mind!
  • Communication The data experts need a medium to share their models that can in-turn allow others to ask new questions.
  • And finally, all of this needs to be done after solving the Technical challenges in building the interactive systems that support all of this.

Tools that can sufficiently address these challenges are the way to go in future. They can truly help the humans in their sense-making processes by providing them with responsive and interactive methods to not only test and validate their hypotheses but also communicate them.

Saman devoted the rest of the talk to demo some of the tools that he contributed towards and gave some examples of beautiful data visualizations. Most of them were accompanied by a lot of gasping sounds from the audience. He also presented some initial guidelines for building HDI interfaces based on these experiences.

The Story of Digital Color

Earlier this year, I did some work on digital color management for a project. During my readings for the project, I accumulated a lot of interesting articles that I thought I could share. The festive colors around inspired me to finally write about them (and also in case I need to refer them again :D). In this post, I have presented a collection of articles, papers, wikis, comics, podcasts …, that you may refer to find out more about the subject. Let’s then begin our story, starting from all the way back to how we see and perceive colors:

I. The background

Dispersion of light - Prism experiment.
The ‘splitting of white light into seven colors’ experiment. In reality, you see more of a continuous spectrum than the discrete seven colors shown above.

Before we get started on digital color, let’s refresh some  high-school science topics. It is kind of mind-boggling to think about it, but the concept of colors is something that you make up in your own head. Fundamentally they are electromagnetic radiation with frequencies in the visible range. Our eye’s retina is layered with mainly two types of photoreceptor cells: the rods (black-and-white vision) and the cones. The cones enable us to see color and are of three types: rho (more sensitive to longer wavelengths), gamma (medium) and beta (short). At this point I would like to introduce you to the applets designed by Prof. Mark Levoy. I’d strongly recommend you to play around with them, at least the ones on the Introduction to color theory page.

Radiation with different wavelengths (or frequencies) excite these color receptors in our eyes to varying levels which is then processed by the brain to give us a perception of seeing color. This phenomenon is known as metamerism or the tri-stimulus response in humans.  Such type of color reproduction is cheaper to process and is easier to control. IA similar technique is also exploited in building displays for our computer screens and mobile phones as well – the objects that we cherish the most and spend most of our time staring at. They also use three types of sources to produce all the colors on the display.

There may be differences in how we see the world by adding more types of color receptors. Most men have 3 types of cones (8% have even fewer types and are color-blind), while women can have up-to four due to genetic factors. This Oatmeal comic beautifully illustrates how the number of colors affects color vision in erm.. the Oatmeal way. As a bonus, you also get to find out about how Mantis Shrimp’s vision is powered by technology superior to humans. [1] All of this and the search for a tetra-chromat woman, can be found in this Radiolab podcast on colors.

Now that you understand that the beauty is indeed in the eye, let’s get a little further into the color theory and move on to our next topic.

II. Color Theory Basics

In the previous section, we learned that the illusion of color can be created by three primary colors. Based on this we have two types of color systems:

a) Additive: We add varying quantities of primaries to get other colors. If we are using R, G and B as our primaries: we can have R+G = yellow, R+B = magenta and B+G = cyan. This kind of color mixing is used in digital displays when we have individual sources for each primaries.

b) Subtractive: A paint or ink based medium would follow such a system. It is named so because of the fact that we perceive the color of object as the kind of light that it reflects back, while absorbing the rest of the colors. This can be imagined to be like subtracting color by reflecting it. An example of this system could have Cyan, Magenta and Yellow as the primaries. We also add black to increase contrast and for other practical concerns in the popular CMYK color format.

The other type of system that you may have heard about deals with Hue, Saturation and Value (HSV). These three dimensions are supposed to describe how we actually understand colors:

200px-Munsell-system.svg
Representation of a HSV color system. Hue is depicted as an angle of of the circle, saturation along the radial line and value along the vertical axis through the center.
  1. Hue: name of the color – red, yellow, blue
  2. Saturation:  a color’s ‘strength’. A neutral gray would have 0% saturation while saturated, apple red would be 100%. Pink would be an example of something in between – an unsaturated red.
  3. Value: It deals with the intensity of light. You can kind of understand it as seeing colors in dark and bright light. You don’t see any color in very dim environments and also when it is blinding bright.

This color system is more suited for image processing and manipulation uses.

III. Color Spaces

Finally we have arrived at the computer science portion of our discussion. Since computer displays have to process and present all of these different types of colors, we must find a good way to represent them as well. Most color spaces have 3 to 4 components (or channels or dimensions). A complete digital image reproduction system would involve three phases: 1) acquiring the image (say using a camera), 2) transmitting it (saving it in an image format / using an A/V cable etc.) and finally 3) displaying it (on a screen, printing or projecting it). Some color formats are more suited for a particular phase in this system. These color spaces would be based on one of the color systems that we learned about in the section above.

The simplest of the color spaces would be a Gray color space. It only has a single channel with values varying from black to white. The common 3-channel color space families are:

RGB family: These are mainly used in displays and scanners. Members of this family included sRGB, Adobe RGB color formats etc. These formats are defined by international standards defined by various organizations.

YUV / YCbCr / YCC family: These are the most unintuitive types of color spaces. They were designed keeping in mind the transmission efficiency and for efficient storing.

Device Independent Colors: All the colors spaces that we have discussed till now may produced varied results on different output devices unless calibrated. Different devices have different ranges of colors that they can produce. [2] As a result they are called device dependent colors. To counter this, some clever folks at CIE developed imaginary color formats (although not useful for any making output devices [3] ) that specify color as perceived by a ‘standard’ human. Examples of these colors spaces include CIE XYZ, L*a*b and L*u*v.

The Apple developer article on color management is a good source to read more on this topic.

IV. Color Display: Techniques and Terminologies

Now that we have covered most of the basic stuff. Let’s talk about the other common terms in brief that you may have encountered about color reproduction.

  • Color Depth and Bits per pixel (BPP)
    These parameters define the number of bits used to define a single pixel’s color. If you are using 8 bit color, you will use 3 bits (or 8 levels) for R & G and 2 for B. This is assuming that your 3 channels are red, green and blue. Similary you can have other color depths like 16-bit and 24-bit (True color) which can represent 256 shades of red, green and blue. Modern displays support something known as deep color (up to 64-bit) with gamut comprising of a billion or more colors. Although our human eye can not distinguish between so many colors, we need this bigger gamuts for high dynamic range imaging. This also affects image perception by the humans as we do perceive red, green and blue in equal capacities. So even though we may be producing more colors on the screen, we may not have sufficient number of shades for the colors we are more sensitive to. Having a larger gamut takes care of all those shades to which our eyescan distinguish but need more information in order to be represented in the digital space.
  • Color Temperature
    Color temperature is derived from the color of radiation emitted by a black body when heated to a particular temperature. Hence, it is commonly specified in units of Kelvin. Our main interest here is adjusting the white point or the chromaticity of the color reproduced by equal red, green and blue components in an output device. This allows us to adjust the colors appear “warmer” or “cooler”.
  • Dynamic Range and Quantization
    You may have heard about the newer camera phones having an HDR mode. These phones are able to process a photograph so that it can have both clear shadows and also brighter regions in the frame. The dynamic range of the image is what they are referring to here. Dynamic range is the ratio of the brightest to the darkest light levels. These levels are quantized into several intensity levels in this range. An 8-bit device would have 256 such intensity levels.
  • Gamma
    Gamma
    We are able to better distinguish between colors intensities at lower levels.

    Our eye is sensitive to the various intensities of colors in a non-linear or power relationship. This allows us to see clearly both indoors (or at night) and outdoors in bright daylight. [4] This leads us to have something known as gamma correction in computer graphics.  You can map the linear levels of your images to a curve of your choice. This must be done according to the characteristics of the output device like the monitor. When you adjust the brightness, contrast or gamma settings on your display, you are essentially manipulating this curve. This GPU Gems article from the Nvidia’s Developer zone tells you more about Gamma that you need to know.  I think, I’ll probably stop here but this topic probably deserves more space than a short paragraph. There are a couple of excellent articles by Charles Poynton on this such as The Rehabilitation of Gamma and the Gamma FAQ.

  • Dithering
    Our eyes tends to smear adjacent pixel colors together. This phenomenon is exploited to simulated higher color depth images than what that could be supported by an output device or possible in an image format. You can see some examples of dithering here. In another type of dithering known as temporal dithering, colors are changed in consecutive frames to create a similar illusion.

V. The sRGB color format

Although now a bit dated, most of the consumer devices still use sRGB as the default color spaces. When you see sRGB content on compatible devices, you will enjoy the same colors. You can find out more about what goes into making a standard color space here: http://www.w3.org/Graphics/Color/sRGB.html. After reading about the fundamentals of digital color, I hope that you’d have a good understanding of how to read a specification like that.

This is a very short post considering the breadth of topics that it deals with. I have tried to highlight the human factors that influenced the development of color theory and technologies throughout this post. They sure were able to piqué my curiosity to find out more. I hope that you’d also like this collection of resources to read about color. I would be glad if you could point out any errors and typos here.

Image Credits: Munsell System. © 2007, Jacob Rus

Footnotes

  1. Update: Study reveals that Mantis shrimp uses color to communicate! http://www.livescience.com/42797-mantis-shrimp-sees-color.html?cmpid=514636 ^
  2. More info here: http://graphics.stanford.edu/courses/cs178-10/applets/colormatching.html ^
  3. Device independent colors are useful for studying the color theory. They are also used as intermediaries when converting between different colors spaces. ^
  4. This could be similar to other psychometric curves: http://en.wikipedia.org/wiki/Psychometric_function ^

The Stroop Test

The Stroop task is among the popular research tools used in experimental psychology. It involves creating a conflict situation using words and colors. The original experiment was conducted by John Ridley Stroop way back in 1935. The participants are asked to do an oral reading of color words – like “red“, “green“, “blue” etc. in these tasks. In his first experiment, Stroop compared the reading times of words in two conditions: a) the neutral condition: words are presented in a normal black ink on a white background; and b) Incongruent condition: having incompatible color combinations e.g. “red“. It was during his second experiment that he found the reading times were much faster when using colored rectangles as compared to incongruent color-word combinations.

Other Stroop paradigms have also compared incongruent with congruent conditions. The Wikipedia article on the subject summarizes their results:

Three experimental findings are recurrently found in Stroop experiments:

  1. A first finding is semantic interference, which states that naming the ink color of neutral stimuli (e.g. when the ink color and word do not interfere with each other) is faster than in incongruent conditions. It is called semantic interference since it is usually accepted that the relationship in meaning between ink color and word is at the root of the interference.
  2. Semantic facilitation, explains the finding that naming the ink of congruent stimuli is faster (e.g. when the ink color and the word match) than when neutral stimuli are present (e.g. when the ink is black, but the word describes a color).
  3. The third finding is that both semantic interference and facilitation disappear when the task consists of reading the word instead of naming the ink. It has been sometimes called Stroop asynchrony, and has been explained by a reduced automatization when naming colors compared to reading words.

I needed a mobile (Yay, HTML5!) Stroop task as one my experiments in a class project. First, I tried using the Google Speech API but it was failing to recognize my own speech inputs, so I resorted to using regular buttons instead. I also wanted to vary the difficulty of the task, so I added an option of adding more color variations (More buttons, more targets, Fitts’ law …). Here’s what I could come up with in a couple of hours:

Talk: Look! New stuff in my room!

Yesterday, I attended a talk by Prof. David Forsyth. One of the perks of being a student is to be able to attend seminars like these. The talk was mostly about his work on understanding pictures of rooms and inserting objects into them. It was a light talk and he did not go too much into the details of his paper. Apart from that he gave an overview of the current work done by the computer vision researchers and his vision (pun intended) for the future. Overall, it was a fun talk to attend and a Friday evening well spent :D.

Here is a video showing a demo of the method, in case you are curious:

Talk: Socially Embedded Search

This week I attended a full house talk by Dr. Meredith Ringel Morris on Socially Embedded Search Engines. Dr. Morris put together a lot of material in her presentation and we (audience) could appreciate how she presented all of it, with great clarity, in just one hour. But I think it would tricky for me to summarize everything in a short post. Do check out Dr. Morris’ website to find out more information on the subject.

Social Search is term for when you pose a question to your friends by using one of the social networking tools (like Facebook, Twitter). There is good chance that you might have already been using “Social Search” without knowing the term for it. So, why would you want to do that instead of using regular search engines that you have access to? It may be simpler to ask your friends at times and they could also provide direct, reliable and personalized answers. Moreover, this is something that could work along with the traditional search engines. Dr. Morris’ work gives some insight into the areas where the search engineers have opportunities in combining traditional algorithmic approaches with social search. She tells us about what kind of questions are asked more in a social search and which types of them are more likely to succeed in getting a useful answer. She goes on further into how the topics for these questions vary with people from different cultures.

I really liked the part about “Search buddies” during the talk. In their paper, Dr. Morris and her colleagues have proposed implanting automated agents that post relevant replies to your social search queries. One type of such an agent tries to figure out the topic for the question and recommends friends who seem to be interested in that area by looking at their profiles. While another one would try to use an algorithmic approach and post a link to a web-page that is likely to contain an answer to the question. It was interesting to know more about how other people reacted to the involvement of these automated agents. While some of the people in the experiment appreciated being referred to for an answer, a lot of them found them obnoxious when they didn’t perform well in identifying the contexts. In her more recent work, Dr. Morris has tried to solve these problems by recruiting real people from Mechanical Turk to answer questions on Twitter. Such an approach could respond to people’s questions in a smarter way by collecting information from a several people. It could then respond to these questions in the form of a polling result and quote the number of people recommending a particular answer. It can also work by taking into account any other replies that the participant would have already received from one of his followers. The automated agent would then present that answer for a opinion poll from the Turkers. Although such a system could provide more intelligent replies than ‘dumb’ algorithms but it may still fail in comparison to responses from your friends which would certainly be more personalized and placed better contextually. During the QnA session, one of audience members raised a question (with a follow-up question by Prof. Kraut)  about comparing these methods with question-and-answer websites such as Quora. While these sites may not provide as personalized results but will certainly do better in drawing the attention of people interested in similar topics. It may not be always possible to find somebody amongst your friends, to answer question on a specialized  topic.

Dr. Morris’ talk provided some really good backing for some of the recent steps taken by search engines like Bing (having ties with both Twitter and Facebook), Google (and the Google plus shebang) and also Facebook (with Graph Search) in this direction. It would be interesting to see how social computing research shapes the future of internet search.

Further Reading

You can find Dr. Morris’ publications on this topic here: http://research.microsoft.com/en-us/um/people/merrie/publications.html