We’ve all seen tag clouds by now, the visualization technique that shows the importance (however defined, but typically by prevalence) of a word by assigning a proportionately sized font.
But now comes along a tool that treats these clouds as art. Wordle’s “Beautiful Word Clouds” is quite addictive, allowing you to enter the raw text and then play around with layout algorithms, fonts and coloring schemes to produce some very nice looking clouds. The author — Jonathan Feinberg — works here at IBM, a fact I did not discover until I had already wasted hours playing with the tool. So maybe I can count this as work now?
Here are a few examples of word clouds formed by analyzing three different texts. Can you guess the identity of the three texts?
Some of my wish-list items are:
- Apply a stemming algorithm to conflate words with the same root. So in the last example, “standard” and “standards” are counted separately, when they are probably best counted as the same word.
- Auto generate an image map associated with the cloud
- Export to PNG (even if just written temporarily to server, I can download it from there)
- I’d love to read a paper on how the layout algorithms works
- What would happen if you combined Kohonen self-organizing maps with word clouds? Arrange the words so their proximity in the cloud was correlated with co-occurrence in the text.