Introduction | Motivation | Examples | Code | About |
Word storms are a visualization tool for analysing text corpora. Just as a storm is a group of clouds, a word storm is a group of word clouds. Each cloud in the storm represents a subset of the corpus. For example, a storm might contain one cloud per document, or alternatively one cloud to represent all the documents written in each year, or one cloud to represent each track of an academic conference, etc.
Although word clouds are a popular tool for visualizing documents, they are not a good tool for comparing documents, because identical words are not presented consistently across different clouds. In order to make the comparisons easier, we build the clouds coordinately, locating shared words in similar positions and emphasizing the most informative words. In this way, similar documents are represented by visually similar clouds.
Standard clouds are difficult to compare. In the figure, we represent four articles: three about materials and one about mathematics. All clouds look very different and it is hard to check the presence of words.
A Coordinated Word Storm is easier to analyse. In the figure, there are the same documents as before. As shared words appear in similar locations with the same color and orientation, words are easier to find and similar documents are represented by similar clouds. In this example, it is clear by looking at the clouds that the first three articles are related and the fourth one is the most different. Moreover, as the transparency of the colors is associated with the importance of the words, the informative terms stand out the most. In this case, words such as 'materials', 'research' and 'development' aren't very important because they are very common, while words such as 'light', 'alloys', 'composite' and 'theory' give us more information.
Word Storms can be used in different scenarios. Here, we show some real deployment examples:
ICML 2012 The International Conference on Machine Learning 2012 used a word storm in their main website to compare the articles in each session.
Analyse your documents by creating and costumizing your own word storms. You can choose how the clouds look by setting the font or the number of words, but you can also decide how to select the terms or how to emphasize the important ones.
Download the code from github.
This project was developed by Quim Castellà and Charles Sutton at the University of Edinburgh.
More information can be found in the article:
Word Storms: Multiples of Word Clouds for Visual Comparison of Documents.
Quim Castella and Charles Sutton
[ arxiv ]
This research project was made possible by funding from the Engineering and Physical Sciences Research Council [grant number EP/J00104X/1].