• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / 2011 / Archives for March 2011

Archives for March 2011

Twitter Powers of Ten

2011/03/25 By Rob 19 Comments

Time-Based Profiling

Before any of this will make sense, I ask you to imagine doing a survey of your local shopping mall or other busy commercial shopping district.  You want to know where people congregate, where they spend most of their time.  Is it in a particular shop, in the food court, or in some dark corner of the parking garage?

There are a few ways of solving this problem:

  • You could have a video capture of the entire complex, digitize that data and map where everyone is.  Aggregate over a representative time interval (days?  weeks?) and you will have a good idea where people hang out.  The downside of this approach is that it requires an expensive and complex camera system,  and generates a massive amount of data.
  • Another approach would be to do this with a series of still cameras that cover the entire mall.  Take a snapshot at period intervals.  A bit less expensive, but still requires “getting everyone in the frame”.
  • Yet another approach is to sample both by time and by location.  So don’t install cameras all over the mall.  Have one hand-held camera, and take a picture in the book store one minute, another picture in the food court another minute, etc.  Aim for coverage over time and locations.  And repeat, repeat, repeat.  Take thousands of samples.  This is low tech on the data capture side, but can still generate massive amounts of data.

So three approaches.  Obviously some approaches are easier to implement for the owner of the mall.  But only the last one is doable by the average citizen.

This is essentially the situation we find ourselves in with Twitter.  They do have APIs that can be used to query their user data.  But it is all “rate-limited”, meaning only a certain number of requests can be made per IP address per day.  So it is impossible to get a running stream of all activity (a “video”) or even a snapshot of all activity at a single time (a “still camera”).  But what we can do is access the “Twitter Public Timeline“, which will give you the most recent 20 tweets.  This can be queried every 60 seconds, up to your daily limit.

I’ve been capturing the Twitter Public Timeline since late 2009.  I have now nearly 6 million records, each one containing the message, of course, but also the name of the user and their “Followers” and “Following” count at that point in time.  I started doing scatter plots of this data and was amazed at the detailed structure evident in the data, that illustrate some interesting ways in which Twitter is being used.  No single graph can show it all, so I’m giving you a series of charts, each one showing an area of the Following/Followers phase space 10x larger.

All charts here were done using the open source R environment.

One Thousand Followers

In this chart each pixel represents one Twitter user, plotted at a position reflecting how many people they are Following, and how many Followers they in turn have.  This chart is zoomed in to show only those whose Following/Follower counts are 1000 or fewer.

We see a few trends here.  First, there is a predominance of users with counts less than 300 or so.  But we also see a strong trend toward parity in counts.  That is the line going up to the right at 45 degrees.  This would be expected for socially-interacting groups of mutual followers.

What I did not expect were the “spikes” for users who follow 100, 200 and 300 accounts.  This is not an aliasing artifact of the graphing.  This is real.  Is there something out there that would lead large numbers of users to follow exactly 100, 200 or 300 users?

(For those of you interested in how the chart was created, I used alpha blending to deal with the “overplotting” problem.  So each point is plotted in a partially transparent way, so an area gets darker the greater the density of points.  If I didn’t do that, the entire chart would be one giant blot of black, with no discernible patterns.   I also introduced random “jitter” between -0.5 and 0.5 to avoid false patterns caused by integer quantization interacting with screen resolution.)

Ten Thousand Followers

Moving out a factor of ten, we now look at those users who have 10,000 or fewer followers.  Again, each pixel represents one sampled user.  The entire previous chart would fit in to the lower left corner.

The salient feature here is the hard cut-off at 2000.  This is due to Twitter’s “aggressive following” limitation:  “Once you’ve followed 2000 users, there are limits to the number of additional users you can follow: this limit is different for every user and is based on your ratio of followers to following.”  They are a bit coy about what exactly the rule is, but a look at the chart certainly suggests that having a Following/Followers ratio > 1 is going to be a problem.

We also see an unexplained density of people Following exactly 1000 users.

One Hundred Thousand Followers

Another factor of 10 and we switch to a different presentation, representing users with small circles rather than pixels.  We’re now starting to see recognizable users and information sources.  I’m illustrating some account names at random.   Maybe not exactly celebrities, but there are some broadly followed users here.  Since the only way to follow 100,000 users is to have close to that number already following you, the lower right half of the chart is empty, and will remain so as we continue to zoom out.

The structure here seems to be:

  • Information pushers who follow nearly no one, up the y-axis on the left.
  • Users who follow almost everyone who follows them, running diagonally
  • Nothing much in the middle

One Million Followers

Zooming out another factor of 10, and we see that the Following count trails off.  Does Twitter have another limit here?  Or do people realize that it is pointless to follow 500,000 people?  But why wouldn’t they also see that it is senseless to follow 50,000 people?

Ten Million Followers

And in the last chart we take it out one more order of magnitude, and the Twitterverse recedes to be Ellen DeGeneres, Britney Spears, Barack Obama, Justin Bieber and Ashton Kutcher.   If you are an average Twitter user, like me, everyone you know and actually interact with on Twitter is represented by 1/20th of a pixel in the lower left corner of the chart.

Note that this chart (and the previous) one does not reflect the current Follower/Following count for these particular users.  This is not a concurrent snapshot.  This was all sampled over an 18 month period of time. Different users are necessarily shown according to their status at different dates.  The point is to show the structure of the data, not make a claim that, e.g., Ellen DeGeneres has more followers than Justin Bieber.

  • Tweet

Filed Under: Blogging/Social

OASIS ODF 1.2 Committee Specification Approved

2011/03/25 By Rob 3 Comments

A few quick ODF updates.  We have a number of projects moving forward at multiple levels.

First, just last week the OASIS ODF TC approved the ODF 1.2 Committee Specification.  This is the highest level of approval we can give to the specification in the technical committee.

As some of you probably know, most standards bodies have a two-level approval process, where work originates in a technical committee (in some organizations called a working group) where the specification is written, reviewed and approved by specialists, before being passed on to a “consensus body”  for approval by a wider group of interests.  We see this in ISO/IEC JTC1, with work first approved at the WG/SC level, and then final approval given by JTC1.

An OASIS Committee Specification requires 2/3 approval of the TC, with no more than 25% disapproving.  ODF 1.2’s ballot ended last week with 17 Yes votes, 100%.

The TC’s work on ODF 1.2 is now done.   There are some adminstrative tasks remaining, and we need to go through the review/approval by the general OASIS membership, but the technical work is now done.  We now move on to ODF 1.3, as well as some maintenance-related activities on ODF 1.1.

And speaking of maintenance, we have two ballots related to IS 26300 underway in ISO/IEC JTC1:

  • A DCOR ballot to approve technical corrigenda for ISO ODF, mainly correction of typographical errors reported by the UK and Japan.  This ballot will end April 25th.
  • An FPDAM ballot to approve an amendment to ISO ODF.  The effect of this amendment will be to make ISO ODF be equiavelent to OASIS ODF 1.1.  This ballot will end June 8th.

I’d urge NB members to review these documents carefully and cast a vote in these ballots.

On the ODF-Next side, the discussion that is getting the most attention right now is related to change tracking.  The Advanced Document Collaboration subcommittee is now reviewing two proposals, one contributed by DeltaXML and another contributed by Microsoft.  We’ll be having a series of meetings in April to discuss these two proposals.  Hopefully we’ll reach a consensus, possibly a compromise.  If necessary, as a last resort, we’ll vote.

  • Tweet

Filed Under: ODF

Best Practices for Authoring Interoperable ODF Documents

2011/03/10 By Rob 2 Comments

In the OASIS ODF Interoperability and Conformance TC we have recently started work on a new document, a “Committee Note” which will be called, “Best Practices for Authoring Interoperable ODF Documents”.

I will be the editor for this document.

If you are not yet familiar with a “Committee Note”, it is a new category of document that has recently been added to the OASIS process.  Think of it being analogous to an ISO Technical Report.  A Committee Note (or CN) goes through the same level of review and approval with a Technical Committee, the same public review requirement, etc.  But it does not get approved as a standard, so it does not define, for example, conformance requirements.  It is intended for things like implementation guides,  best practices, white papers, etc.

The general aim of the new CN is to collect and describe guidelines for authors on how best to create interoperable (portable) ODF documents.  What to do and what to avoid.  Although the focus will be on ODF, much of this will be applicable to any WYSIWYG word processing environment.

I’m thinking of this as being analogous to the “How to write portable C” books we saw years ago.  As many of you know, C programs can range from the perverse (see the Obfuscated C Competition for examples) to highly portable.  But portability does not come about by accident.  The language permits portability, but it does not enforce it on the user.  C is powerful enough for a user to hang themselves.

The modern WYSIWYG word processor is similar.  A user can create interoperable (portable) documents, but the word processors also allows them to create documents that will be tied tightly to their precise operating environment and will render poorly everywhere else.  The tool takes you only so far, and then user education must help with the rest of the way.  I hope that this Committee Note will provide some of that user education.

I am absolutely certain that I am not the first one to have thought about this problem.  In fact, I suspect (and hope) that many of my readers have done so themselves.  So before I start drafting this document, I’d like to solicit for contributions of material.  Maybe you have written a paper, report or blog post on this  topic?  Maybe you can take a few minutes to jot down your ideas?  Maybe you can refer us to other sources of information?

But please don’t give me the information here.  Per OASIS IPR rules we need to channel any contributions to the Technical Committee, so permission to use your original material is secured, from the copyright perspective.  So if you have a contribution you want to make for this document, please do so on the OIC TC’s comment list.  And if you want to participate more closely in the creation and editing of this document, then you are always welcome to join OASIS and participate directly in the TC’s work.  The cost for individual memberships is quite reasonable.

  • Tweet

Filed Under: Interoperability, ODF

The BSA’s New Candlemakers

2011/03/03 By Rob 5 Comments

The Business Software Alliance  (BSA) is at it again.   They are claiming that new UK Cabinet Office policy in favor of open standards — the kind of standards that the web is built on and which has created billions in new economy jobs —  is actually a bad thing, since it would (according to the BSA), “reduce choice, hinder innovation and increase the costs of e-government”.

Really?  Are they serious?

Those with a penchant for the history of economic thought may recall the 19th century French liberal economist Claude Frédéric Bastiat and his satirical economic parables, which attacked prevalent economic errors of his time.  We have need of Bastiat at this hour, especially his skewering of an entrenched industry’s rent-seeking push for government protection from lower cost competitors.  His attack on protectionism was called “The Candlemaker’s Petition“, and a portion of it reads like this:

We are suffering from the ruinous competition of a rival who apparently works under conditions so far superior to our own for the production of light that he is flooding the domestic market with it at an incredibly low price; for the moment he appears, our sales cease, all the consumers turn to him, and a branch of French industry whose ramifications are innumerable is all at once reduced to complete stagnation. This rival, which is none other than the sun, is waging war on us so mercilessly we suspect he is being stirred up against us by perfidious Albion (excellent diplomacy nowadays!), particularly because he has for that haughty island a respect that he does not show for us.

We ask you to be so good as to pass a law requiring the closing of all windows, dormers, skylights, inside and outside shutters, curtains, casements, bull’s-eyes, deadlights, and blinds — in short, all openings, holes, chinks, and fissures through which the light of the sun is wont to enter houses, to the detriment of the fair industries with which, we are proud to say, we have endowed the country, a country that cannot, without betraying ingratitude, abandon us today to so unequal a combat.

Bastiat then goes on to enumerate the benefits that would ensue, if only the government would shut out the sun:

First, if you shut off as much as possible all access to natural light, and thereby create a need for artificial light, what industry in France will not ultimately be encouraged?

If France consumes more tallow, there will have to be more cattle and sheep, and, consequently, we shall see an increase in cleared fields, meat, wool, leather, and especially manure, the basis of all agricultural wealth.

If France consumes more oil, we shall see an expansion in the cultivation of the poppy, the olive, and rapeseed. These rich yet soil-exhausting plants will come at just the right time to enable us to put to profitable use the increased fertility that the breeding of cattle will impart to the land.

Our moors will be covered with resinous trees. Numerous swarms of bees will gather from our mountains the perfumed treasures that today waste their fragrance, like the flowers from which they emanate. Thus, there is not one branch of agriculture that would not undergo a great expansion.

The same holds true of shipping. Thousands of vessels will engage in whaling, and in a short time we shall have a fleet capable of upholding the honour of France and of gratifying the patriotic aspirations of the undersigned petitioners, chandlers, etc.

But what shall we say of the specialities of Parisian manufacture? Henceforth you will behold gilding, bronze, and crystal in candlesticks, in lamps, in chandeliers, in candelabra sparkling in spacious emporia compared with which those of today are but stalls.

There is no needy resin-collector on the heights of his sand dunes, no poor miner in the depths of his black pit, who will not receive higher wages and enjoy increased prosperity.

It needs but a little reflection, gentlemen, to be convinced that there is perhaps not one Frenchman, from the wealthy stockholder of the Anzin Company to the humblest vendor of matches, whose condition would not be improved by the success of our petition.

“…and especially manure” .  Times have not changed much, have they?  I am so glad that IBM is no longer a BSA member.

  • Tweet

Filed Under: Economics

Primary Sidebar

Copyright © 2006-2023 Rob Weir · Site Policies