Beautiful Word Clouds

2008/06/26 By Rob 16 Comments

We’ve all seen tag clouds by now, the visualization technique that shows the importance (however defined, but typically by prevalence) of a word by assigning a proportionately sized font.

But now comes along a tool that treats these clouds as art. Wordle’s “Beautiful Word Clouds” is quite addictive, allowing you to enter the raw text and then play around with layout algorithms, fonts and coloring schemes to produce some very nice looking clouds. The author — Jonathan Feinberg — works here at IBM, a fact I did not discover until I had already wasted hours playing with the tool. So maybe I can count this as work now?

Here are a few examples of word clouds formed by analyzing three different texts. Can you guess the identity of the three texts?

Some of my wish-list items are:

Apply a stemming algorithm to conflate words with the same root. So in the last example, “standard” and “standards” are counted separately, when they are probably best counted as the same word.
Auto generate an image map associated with the cloud
Export to PNG (even if just written temporarily to server, I can download it from there)
I’d love to read a paper on how the layout algorithms works
What would happen if you combined Kohonen self-organizing maps with word clouds? Arrange the words so their proximity in the cloud was correlated with co-occurrence in the text.

Comments

Matthew Raymond says

2008/06/26 at 8:11 am

The word “open” really should be larger in that last cloud.

Reply
trader.name says

2008/06/26 at 5:29 pm

The first cloud is obviously from “Moby Dick”; the second one appears to be from a collection of Shakespeare’s sonnets. The third appears to be from a OOXML-vs-ODF rant of some kind.

Reply
Anonymous says

2008/06/26 at 8:54 pm

I see Moby Dick and the third is probably this blog. But what’s the 2nd one? I want to say the Bible, but it’s missing some obvious words that seem to rule that out.

Reply
Peter says

2008/06/26 at 11:21 pm

Number 2 looks like Shakespeare sonnets, or possibly John Donne to me.

Reply
Anonymous says

2008/06/26 at 11:50 pm

Number 2, William Shakespere, Romeo and Juliet ?

Reply
Anonymous says

2008/06/27 at 2:23 am

If I had to guess on the second one, I would say Song of Solomon.

Reply
Anonymous says

2008/06/27 at 3:33 am

The first is *obviously* Moby Dick.

The second, I think, is Shakespeare’s sonnets.

And I agree the third is quite possibly Rob’s blog.

Reply
Anonymous says

2008/06/27 at 3:44 am

Are the second one shakespeare?

Reply
Nate says

2008/06/27 at 5:34 am

The 2nd one looks like Shakespeare to me. Since there’s no visible character names, I’m guessing that’s the collected sonnets.

Welcome back, Rob. We missed you.

Reply
Konrad says

2008/06/27 at 6:36 am

shakespeare isn’t it?

Reply
Rob says

2008/06/27 at 8:48 am

The answer is:

1) Moby Dick

2) Shakespeare’s Sonnets (here I increased the words shown to 1000, resulting in the denser cloud)

3) This blog

Reply
Anonymous says

2008/06/27 at 6:00 pm

I see trouble with a stemming algorithm – would it think “XML” and “OOXML” were the same word? Scary thought… :)

Reply
Rob says

2008/06/27 at 8:25 pm

Stemming algorithms usually have enough language smarts to avoid things like that. They are looking for grammatical endings like -ing, -ly, etc., and conflating words with these suffixes with their roots.

For quick and dirty processing, I’ve always used the Porter Stemmer, which I see now is online.

Reply
Doug Mahugh says

2008/07/02 at 12:55 am

These Wordle summaries are cool. I like the way they summarize the concepts in a large document and provide a high-level overview that’s often pretty accurate.

For example, if you look at the words in the “Moby Dick” image and that sounds interesting to you at first glance, there’s a good change you’ll find the book interesting on some level. If not, you probably won’t.

Reply
Sean O'Donnell says

2011/06/15 at 5:42 am

Its not as advanced as wordle, but you might find http://wispy.me fun to play with as well

Reply

Trackbacks

ODF 1.2 Word Clouds says:

2010/07/29 at 4:57 pm

[…] I thought of using Jonathan Feinberg’s excellent Wordle applet (which I wrote about a while back). This applet creates a word cloud, based on word frequency of text you feed it. As a torture […]

Reply

Reader Interactions

Comments

Trackbacks

Leave a Reply to Konrad Cancel reply