• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

An Antic Disposition

  • Home
  • About
  • Archives
  • Writings
  • Links
You are here: Home / Archives for Rob

Rob

PJ, Goodbye and Good Luck

2011/05/10 By Rob 7 Comments

There was a time when daggers were drawn on Linux and its demise was plotted in dark detail.  At that hour stepped out a shieldmaiden with a blog, and that blog was Groklaw.   Eight years later, we hear the news that Groklaw will cease new postings after May 16th.  My sadness in hearing this news is more than equaled by my gratitude to PJ and her community of researchers and commentators, for their enormous effort and unparalleled achievement over these years.   The world is a better place because of PJ.  Who can hope to say better?

As a retrospective of a different kind, I’ve taken the titles from every Groklaw article since its start and created a “word cloud” from them, using Wordle.  This shows, at a glance, the issues that have dominated the attention of Groklaw over the years.

Filed Under: Open Source

Ten Things You Didn’t Know About ODF 1.2

2011/05/05 By Rob 6 Comments

Some little known facts, all of them true, but only some of them amusing, and even then only just so, about ODF 1.2, recently approved as a Committee Specification by the OASIS ODF TC:

  1. In producing OASIS ODF 1.2, we had 184 Technical Committee meetings, not including the numerous subcommittee meetings.
  2. During the development of ODF 1.2, the active TC membership grew by 78%.
  3. The ODF TC , during the ODF 1.2 work, had 76 members, from 17 countries, representing 23 companies or organizations, as well as 17 individual members.  The sun never sets on the ODF TC.
  4. ODF TC members received 14,655 emails from the TC’s email list while working on ODF 1.2, including 474 notes with a post-script (PS), 113 with a post-post-script (PPS) and 13 with a post-post-post-script (PPPS), suggesting a new phrase for derangement:  “going postscript”.
  5. ODF 1.2 has been out for public review a total of 210 days.
  6. The ODF TC resolved 1,822 public comments while working on ODF 1.2.  We read every one of them.
  7. ODF 1.2 says “shall” 628 times, but says “please” only 14 times, making it one of the most discourteous specifications around.
  8. ODF 1.2 has 72 external normative references and 16 external non-normative references.
  9. If you printed out all of ODF 1.2 and laid the pages end-to-end, it would be approximately 20% taller than the Eiffel Tower.  You would also probably be arrested.
  10. ODF 1.2’s OpenFormula knows how many imperial pints will fill a cubic light year.  But please, drink only in moderation.

Filed Under: ODF

Twitter Powers of Ten

2011/03/25 By Rob 19 Comments

Time-Based Profiling

Before any of this will make sense, I ask you to imagine doing a survey of your local shopping mall or other busy commercial shopping district.  You want to know where people congregate, where they spend most of their time.  Is it in a particular shop, in the food court, or in some dark corner of the parking garage?

There are a few ways of solving this problem:

  • You could have a video capture of the entire complex, digitize that data and map where everyone is.  Aggregate over a representative time interval (days?  weeks?) and you will have a good idea where people hang out.  The downside of this approach is that it requires an expensive and complex camera system,  and generates a massive amount of data.
  • Another approach would be to do this with a series of still cameras that cover the entire mall.  Take a snapshot at period intervals.  A bit less expensive, but still requires “getting everyone in the frame”.
  • Yet another approach is to sample both by time and by location.  So don’t install cameras all over the mall.  Have one hand-held camera, and take a picture in the book store one minute, another picture in the food court another minute, etc.  Aim for coverage over time and locations.  And repeat, repeat, repeat.  Take thousands of samples.  This is low tech on the data capture side, but can still generate massive amounts of data.

So three approaches.  Obviously some approaches are easier to implement for the owner of the mall.  But only the last one is doable by the average citizen.

This is essentially the situation we find ourselves in with Twitter.  They do have APIs that can be used to query their user data.  But it is all “rate-limited”, meaning only a certain number of requests can be made per IP address per day.  So it is impossible to get a running stream of all activity (a “video”) or even a snapshot of all activity at a single time (a “still camera”).  But what we can do is access the “Twitter Public Timeline“, which will give you the most recent 20 tweets.  This can be queried every 60 seconds, up to your daily limit.

I’ve been capturing the Twitter Public Timeline since late 2009.  I have now nearly 6 million records, each one containing the message, of course, but also the name of the user and their “Followers” and “Following” count at that point in time.  I started doing scatter plots of this data and was amazed at the detailed structure evident in the data, that illustrate some interesting ways in which Twitter is being used.  No single graph can show it all, so I’m giving you a series of charts, each one showing an area of the Following/Followers phase space 10x larger.

All charts here were done using the open source R environment.

One Thousand Followers

In this chart each pixel represents one Twitter user, plotted at a position reflecting how many people they are Following, and how many Followers they in turn have.  This chart is zoomed in to show only those whose Following/Follower counts are 1000 or fewer.

We see a few trends here.  First, there is a predominance of users with counts less than 300 or so.  But we also see a strong trend toward parity in counts.  That is the line going up to the right at 45 degrees.  This would be expected for socially-interacting groups of mutual followers.

What I did not expect were the “spikes” for users who follow 100, 200 and 300 accounts.  This is not an aliasing artifact of the graphing.  This is real.  Is there something out there that would lead large numbers of users to follow exactly 100, 200 or 300 users?

(For those of you interested in how the chart was created, I used alpha blending to deal with the “overplotting” problem.  So each point is plotted in a partially transparent way, so an area gets darker the greater the density of points.  If I didn’t do that, the entire chart would be one giant blot of black, with no discernible patterns.   I also introduced random “jitter” between -0.5 and 0.5 to avoid false patterns caused by integer quantization interacting with screen resolution.)

Ten Thousand Followers

Moving out a factor of ten, we now look at those users who have 10,000 or fewer followers.  Again, each pixel represents one sampled user.  The entire previous chart would fit in to the lower left corner.

The salient feature here is the hard cut-off at 2000.  This is due to Twitter’s “aggressive following” limitation:  “Once you’ve followed 2000 users, there are limits to the number of additional users you can follow: this limit is different for every user and is based on your ratio of followers to following.”  They are a bit coy about what exactly the rule is, but a look at the chart certainly suggests that having a Following/Followers ratio > 1 is going to be a problem.

We also see an unexplained density of people Following exactly 1000 users.

One Hundred Thousand Followers

Another factor of 10 and we switch to a different presentation, representing users with small circles rather than pixels.  We’re now starting to see recognizable users and information sources.  I’m illustrating some account names at random.   Maybe not exactly celebrities, but there are some broadly followed users here.  Since the only way to follow 100,000 users is to have close to that number already following you, the lower right half of the chart is empty, and will remain so as we continue to zoom out.

The structure here seems to be:

  • Information pushers who follow nearly no one, up the y-axis on the left.
  • Users who follow almost everyone who follows them, running diagonally
  • Nothing much in the middle

One Million Followers

Zooming out another factor of 10, and we see that the Following count trails off.  Does Twitter have another limit here?  Or do people realize that it is pointless to follow 500,000 people?  But why wouldn’t they also see that it is senseless to follow 50,000 people?

Ten Million Followers

And in the last chart we take it out one more order of magnitude, and the Twitterverse recedes to be Ellen DeGeneres, Britney Spears, Barack Obama, Justin Bieber and Ashton Kutcher.   If you are an average Twitter user, like me, everyone you know and actually interact with on Twitter is represented by 1/20th of a pixel in the lower left corner of the chart.

Note that this chart (and the previous) one does not reflect the current Follower/Following count for these particular users.  This is not a concurrent snapshot.  This was all sampled over an 18 month period of time. Different users are necessarily shown according to their status at different dates.  The point is to show the structure of the data, not make a claim that, e.g., Ellen DeGeneres has more followers than Justin Bieber.

Filed Under: Blogging/Social

OASIS ODF 1.2 Committee Specification Approved

2011/03/25 By Rob 3 Comments

A few quick ODF updates.  We have a number of projects moving forward at multiple levels.

First, just last week the OASIS ODF TC approved the ODF 1.2 Committee Specification.  This is the highest level of approval we can give to the specification in the technical committee.

As some of you probably know, most standards bodies have a two-level approval process, where work originates in a technical committee (in some organizations called a working group) where the specification is written, reviewed and approved by specialists, before being passed on to a “consensus body”  for approval by a wider group of interests.  We see this in ISO/IEC JTC1, with work first approved at the WG/SC level, and then final approval given by JTC1.

An OASIS Committee Specification requires 2/3 approval of the TC, with no more than 25% disapproving.  ODF 1.2’s ballot ended last week with 17 Yes votes, 100%.

The TC’s work on ODF 1.2 is now done.   There are some adminstrative tasks remaining, and we need to go through the review/approval by the general OASIS membership, but the technical work is now done.  We now move on to ODF 1.3, as well as some maintenance-related activities on ODF 1.1.

And speaking of maintenance, we have two ballots related to IS 26300 underway in ISO/IEC JTC1:

  • A DCOR ballot to approve technical corrigenda for ISO ODF, mainly correction of typographical errors reported by the UK and Japan.  This ballot will end April 25th.
  • An FPDAM ballot to approve an amendment to ISO ODF.  The effect of this amendment will be to make ISO ODF be equiavelent to OASIS ODF 1.1.  This ballot will end June 8th.

I’d urge NB members to review these documents carefully and cast a vote in these ballots.

On the ODF-Next side, the discussion that is getting the most attention right now is related to change tracking.  The Advanced Document Collaboration subcommittee is now reviewing two proposals, one contributed by DeltaXML and another contributed by Microsoft.  We’ll be having a series of meetings in April to discuss these two proposals.  Hopefully we’ll reach a consensus, possibly a compromise.  If necessary, as a last resort, we’ll vote.

Filed Under: ODF

Best Practices for Authoring Interoperable ODF Documents

2011/03/10 By Rob 2 Comments

In the OASIS ODF Interoperability and Conformance TC we have recently started work on a new document, a “Committee Note” which will be called, “Best Practices for Authoring Interoperable ODF Documents”.

I will be the editor for this document.

If you are not yet familiar with a “Committee Note”, it is a new category of document that has recently been added to the OASIS process.  Think of it being analogous to an ISO Technical Report.  A Committee Note (or CN) goes through the same level of review and approval with a Technical Committee, the same public review requirement, etc.  But it does not get approved as a standard, so it does not define, for example, conformance requirements.  It is intended for things like implementation guides,  best practices, white papers, etc.

The general aim of the new CN is to collect and describe guidelines for authors on how best to create interoperable (portable) ODF documents.  What to do and what to avoid.  Although the focus will be on ODF, much of this will be applicable to any WYSIWYG word processing environment.

I’m thinking of this as being analogous to the “How to write portable C” books we saw years ago.  As many of you know, C programs can range from the perverse (see the Obfuscated C Competition for examples) to highly portable.  But portability does not come about by accident.  The language permits portability, but it does not enforce it on the user.  C is powerful enough for a user to hang themselves.

The modern WYSIWYG word processor is similar.  A user can create interoperable (portable) documents, but the word processors also allows them to create documents that will be tied tightly to their precise operating environment and will render poorly everywhere else.  The tool takes you only so far, and then user education must help with the rest of the way.  I hope that this Committee Note will provide some of that user education.

I am absolutely certain that I am not the first one to have thought about this problem.  In fact, I suspect (and hope) that many of my readers have done so themselves.  So before I start drafting this document, I’d like to solicit for contributions of material.  Maybe you have written a paper, report or blog post on this  topic?  Maybe you can take a few minutes to jot down your ideas?  Maybe you can refer us to other sources of information?

But please don’t give me the information here.  Per OASIS IPR rules we need to channel any contributions to the Technical Committee, so permission to use your original material is secured, from the copyright perspective.  So if you have a contribution you want to make for this document, please do so on the OIC TC’s comment list.  And if you want to participate more closely in the creation and editing of this document, then you are always welcome to join OASIS and participate directly in the TC’s work.  The cost for individual memberships is quite reasonable.

Filed Under: Interoperability, ODF

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 8
  • Page 9
  • Page 10
  • Page 11
  • Page 12
  • Interim pages omitted …
  • Page 69
  • Go to Next Page »

Primary Sidebar

Copyright © 2006-2026 Rob Weir · Site Policies