{"id":1648,"date":"2011-03-25T10:26:18","date_gmt":"2011-03-25T14:26:18","guid":{"rendered":"http:\/\/2d823b65bb.nxcli.io\/?p=1648"},"modified":"2016-12-16T09:27:30","modified_gmt":"2016-12-16T14:27:30","slug":"twitter-powers-of-ten","status":"publish","type":"post","link":"https:\/\/www.robweir.com\/blog\/2011\/03\/twitter-powers-of-ten.html","title":{"rendered":"Twitter Powers of Ten"},"content":{"rendered":"<h2>Time-Based Profiling<\/h2>\n<p>Before any of this will make sense, I ask you to imagine doing a survey of your local shopping mall or other busy commercial shopping district.\u00a0 You want to know where people congregate, where they spend most of their time.\u00a0 Is it in a particular shop, in the food court, or in some dark corner of the parking garage?<\/p>\n<p>There are a few ways of solving this problem:<\/p>\n<ul>\n<li>You could have a video capture of the entire complex, digitize that data and map where everyone is.\u00a0 Aggregate over a representative time interval (days?\u00a0 weeks?) and you will have a good idea where people hang out.\u00a0 The downside of this approach is that it requires an expensive and complex camera system,\u00a0 and generates a massive amount of data.<\/li>\n<li>Another approach would be to do this with a series of still cameras that cover the entire mall.\u00a0 Take a snapshot at period intervals.\u00a0 A bit less expensive, but still requires &#8220;getting everyone in the frame&#8221;.<\/li>\n<li>Yet another approach is to sample both by time and by location.\u00a0 So don&#8217;t install cameras all over the mall.\u00a0 Have one hand-held camera, and take a picture in the book store one minute, another picture in the food court another minute, etc.\u00a0 Aim for coverage over time and locations.\u00a0 And repeat, repeat, repeat.\u00a0 Take thousands of samples.\u00a0 This is low tech on the data capture side, but can still generate massive amounts of data.<\/li>\n<\/ul>\n<p>So three approaches.\u00a0 Obviously some approaches are easier to implement for the owner of the mall.\u00a0 But only the last one is doable by the average citizen.<\/p>\n<p>This is essentially the situation we find ourselves in with Twitter.\u00a0 They do have APIs that can be used to query their user data.\u00a0 But it is all &#8220;rate-limited&#8221;, meaning only a certain number of requests can be made per IP address per day.\u00a0 So it is impossible to get a running stream of all activity (a &#8220;video&#8221;) or even a snapshot of all activity at a single time (a &#8220;still camera&#8221;).\u00a0 But what we can do is access the &#8220;<a href=\"http:\/\/twitter.com\/public_timeline\">Twitter Public Timeline<\/a>&#8220;, which will give you the most recent 20 tweets.\u00a0 This can be queried every 60 seconds, up to your daily limit.<\/p>\n<p>I&#8217;ve been capturing the Twitter Public Timeline since late 2009.\u00a0 I have now nearly 6 million records, each one containing the message, of course, but also the name of the user and their &#8220;Followers&#8221; and &#8220;Following&#8221; count at that point in time.\u00a0 I started doing scatter plots of this data and was amazed at the detailed structure evident in the data, that illustrate some interesting ways in which Twitter is being used.\u00a0 No single graph can show it all, so I&#8217;m giving you a series of charts, each one showing an area of the Following\/Followers phase space 10x larger.<\/p>\n<p>All charts here were done using the open source <a href=\"http:\/\/www.r-project.org\/\">R environment<\/a>.<\/p>\n<h2>One Thousand Followers<\/h2>\n<p>In this chart each pixel represents one Twitter user, plotted at a position reflecting how many people they are Following, and how many Followers they in turn have.\u00a0 This chart is zoomed in to show only those whose Following\/Follower counts are 1000 or fewer.<\/p>\n<p>We see a few trends here.\u00a0 First, there is a predominance of users with counts less than 300 or so.\u00a0 But we also see a strong trend toward parity in counts.\u00a0 That is the line going up to the right at 45 degrees.\u00a0 This would be expected for socially-interacting groups of mutual followers.<\/p>\n<p>What I did not expect were the &#8220;spikes&#8221; for users who follow 100, 200 and 300 accounts.\u00a0 This is not an aliasing artifact of the graphing.\u00a0 This is real.\u00a0 Is there something out there that would lead large numbers of users to follow exactly 100, 200 or 300 users?<\/p>\n<p>(For those of you interested in how the chart was created, I used alpha blending to deal with the &#8220;overplotting&#8221; problem.\u00a0 So each point is plotted in a partially transparent way, so an area gets darker the greater the density of points.\u00a0 If I didn&#8217;t do that, the entire chart would be one giant blot of black, with no discernible patterns.\u00a0\u00a0 I also introduced random &#8220;jitter&#8221; between -0.5 and 0.5 to avoid false patterns caused by integer quantization interacting with screen resolution.)<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"1000\" src=\"https:\/\/2d823b65bb.nxcli.io\/blog\/images\/twitter_3.jpg\" alt=\"\" width=\"672\" height=\"671\" \/><\/p>\n<h2>Ten Thousand Followers<\/h2>\n<p>Moving out a factor of ten, we now look at those users who have 10,000 or fewer followers.\u00a0 Again, each pixel represents one sampled user.\u00a0 The entire previous chart would fit in to the lower left corner.<\/p>\n<p>The salient feature here is the hard cut-off at 2000.\u00a0 This is due to Twitter&#8217;s &#8220;<a href=\"http:\/\/support.twitter.com\/entries\/68916-following-rules-and-best-practices\">aggressive following<\/a>&#8221; limitation:\u00a0 &#8220;Once you\u2019ve followed 2000 users, there are limits to the number of additional users you can follow: this limit is different for every user and is based on your ratio of followers to following.&#8221;\u00a0 They are a bit coy about what exactly the rule is, but a look at the chart certainly suggests that having a Following\/Followers ratio &gt; 1 is going to be a problem.<\/p>\n<p>We also see an unexplained density of people Following exactly 1000 users.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"10000\" src=\"https:\/\/2d823b65bb.nxcli.io\/blog\/images\/twitter_4.jpg\" alt=\"\" width=\"672\" height=\"671\" \/><\/p>\n<h2>One Hundred Thousand Followers<\/h2>\n<p>Another factor of 10 and we switch to a different presentation, representing users with small circles rather than pixels.\u00a0 We&#8217;re now starting to see recognizable users and information sources.\u00a0 I&#8217;m illustrating some account names at random.\u00a0\u00a0 Maybe not exactly celebrities, but there are some broadly followed users here.\u00a0 Since the only way to follow 100,000 users is to have close to that number already following you, the lower right half of the chart is empty, and will remain so as we continue to zoom out.<\/p>\n<p>The structure here seems to be:<\/p>\n<ul>\n<li>Information pushers who follow nearly no one, up the y-axis on the left.<\/li>\n<li>Users who follow almost everyone who follows them, running diagonally<\/li>\n<li>Nothing much in the middle<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"100000\" src=\"https:\/\/2d823b65bb.nxcli.io\/blog\/images\/twitter_5.jpg\" alt=\"\" width=\"672\" height=\"671\" \/><\/p>\n<h2>One Million Followers<\/h2>\n<p>Zooming out another factor of 10, and we see that the Following count trails off.\u00a0 Does Twitter have another limit here?\u00a0 Or do people realize that it is pointless to follow 500,000 people?\u00a0 But why wouldn&#8217;t they also see that it is senseless to follow 50,000 people?<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"1000000\" src=\"https:\/\/2d823b65bb.nxcli.io\/blog\/images\/twitter_6.jpg\" alt=\"\" width=\"672\" height=\"671\" \/><\/p>\n<h2>Ten Million Followers<\/h2>\n<p>And in the last chart we take it out one more order of magnitude, and the Twitterverse recedes to be Ellen DeGeneres, Britney Spears, Barack Obama, Justin Bieber and Ashton Kutcher.\u00a0\u00a0 If you are an average Twitter user, like me, everyone you know and actually interact with on Twitter is represented by 1\/20th of a pixel in the lower left corner of the chart.<\/p>\n<p>Note that this chart (and the previous) one does not reflect the current Follower\/Following count for these particular users.\u00a0 This is not a concurrent snapshot.\u00a0 This was all sampled over an 18 month period of time. Different users are necessarily shown according to their status at different dates.\u00a0 The point is to show the structure of the data, not make a claim that, e.g., Ellen DeGeneres has more followers than Justin Bieber.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" title=\"10000000\" src=\"https:\/\/2d823b65bb.nxcli.io\/blog\/images\/twitter_7.jpg\" alt=\"\" width=\"672\" height=\"671\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Time-Based Profiling Before any of this will make sense, I ask you to imagine doing a survey of your local shopping mall or other busy commercial shopping district.\u00a0 You want to know where people congregate, where they spend most of their time.\u00a0 Is it in a particular shop, in the food court, or in some [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[25],"tags":[],"class_list":{"0":"post-1648","1":"post","2":"type-post","3":"status-publish","4":"format-standard","6":"category-blogging","7":"entry"},"_links":{"self":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts\/1648","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/comments?post=1648"}],"version-history":[{"count":13,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts\/1648\/revisions"}],"predecessor-version":[{"id":2528,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts\/1648\/revisions\/2528"}],"wp:attachment":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/media?parent=1648"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/categories?post=1648"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/tags?post=1648"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}