{"id":813,"date":"2010-03-06T14:48:28","date_gmt":"2010-03-06T19:48:28","guid":{"rendered":"http:\/\/2d823b65bb.nxcli.io\/?p=813"},"modified":"2011-03-25T08:00:38","modified_gmt":"2011-03-25T12:00:38","slug":"new-microsoft-shuffle","status":"publish","type":"post","link":"https:\/\/www.robweir.com\/blog\/2010\/03\/new-microsoft-shuffle.html","title":{"rendered":"The New &#038; Improved Microsoft Shuffle"},"content":{"rendered":"<p>A quick update on my post from last week on the &#8220;<a href=\"https:\/\/2d823b65bb.nxcli.io\/blog\/2010\/02\/microsoft-random-browser-ballot.html\">Microsoft Shuffle<\/a>&#8220;, where I looked at how Microsoft&#8217;s &#8220;random&#8221; browser ballot was far from random.<\/p>\n<p>First, I&#8217;d like to thanks those who commented on that post, or sent me notes, offering additional analysis.  I think we nailed this one.  Within a few days of my report Microsoft updated their JavaScript on the <a href=\"http:\/\/www.browserchoice.eu\">browserchoice.eu<\/a> website, fixing the error.  But more on that in a minute.<\/p>\n<h3>Some random observations<\/h3>\n<p>Several commenters mentioned that if you search Google for &#8220;javascript random array sort&#8221; the first link returned will be a JavaScript tutorial that has the same offending code as Microsoft&#8217;s algorithm.  This is not surprising.  As I said in my original post, this is a well-known mistake. But it is no less a mistake.  If you use Google Code Search for the query <a href=\"http:\/\/www.google.com\/codesearch?hl=en&amp;sa=N&amp;filter=0&amp;q=%220.5+-+Math.random%28%29%22++lang:javascript&amp;ct=rr&amp;cs_r=lang:javascript\">&#8220;0.5 &#8211; Math.random()&#8221; lang:javascript<\/a> you will find 50 or so other instances of the faulty algorithm.  So if anyone else is using this same algorithm, they should evaluate whether it is really sufficiently random for their needs.  In some case, such as a children&#8217;s game, it might be fine.  But know that there are better and faster algorithms available that are not much more complicated to code.<\/p>\n<p>Another thing to note is that the Microsoft Shuffle algorithm is bad enough with 5-elements in the array, but the non-randomness gets more pronounced as you increase the length of the array.  Regardless of the size of the array, it appears that on Internet Explorer the 1st element will end up in last place 50% of the time.  There are other pronounced patterns as well.  You can see this yourself this <a href=\"https:\/\/2d823b65bb.nxcli.io\/blog\/attachments\/shuffle\/shuffle-variable.html\">this test file<\/a>, which allows you to specify the size of the array as well as the number of iterations.  Try a 50-element array for 10,000 iterations to get a good sense of how non-random the results can be.<\/p>\n<p>I used that script to run a large test of 1,000,000 iterations of a 1024-element array. The raw results are <a href=\"https:\/\/2d823b65bb.nxcli.io\/blog\/attachments\/shuffle\/ms-million-ie.txt\">here<\/a>.  I took that table, and using R&#8217;s image() function produced a rendering of that matrix.  You can see here the clear over-representation at some positions, including (in the lower left) the flip of the first position to last place.   (I&#8217;m not quite satisfied with this rendering.  Maybe someone can get a better-looking visualization of this same data.)<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" title=\"Distribution of positions\" src=\"https:\/\/2d823b65bb.nxcli.io\/blog\/attachments\/shuffle\/heatmap.png\" alt=\"\" width=\"570\" height=\"516\" \/><\/p>\n<h3>Evaluating Microsoft&#8217;s new shuffle<\/h3>\n<p>Sometime last week &#8212; I don&#8217;t know the exact date &#8212; Microsoft updated the code for the browser choice website with a new random shuffle algorithm.  You see the code, in situ, <a href=\"http:\/\/www.browserchoice.eu\/resources\/scripts\/page.js\">here<\/a>.  The core of it is in this function:<\/p>\n<pre class=\"brush: jscript; title: ; notranslate\" title=\"\">\r\nfunction ArrayShuffle(a)\r\n{\r\n    var d, c, b=a.length;\r\n    while(b)\r\n    {\r\n        c=Math.floor(Math.random()*b);\r\n        d=a&#x5B;--b];\r\n        a&#x5B;b]=a&#x5B;c];\r\n        a&#x5B;c]=d\r\n     }\r\n}\r\n<\/pre>\n<p>This looks fine to me.  I created a new test driver for this routine, which you can try out <a href=\"https:\/\/2d823b65bb.nxcli.io\/blog\/attachments\/shuffle\/shuffle-new.html\">here<\/a>.  Aside from being much faster, it is gives much better results.  Here is a run with a million iterations:<\/p>\n<p>&nbsp;<\/p>\n<h4>Raw counts<\/h4>\n<table border=\"1\">\n<tbody>\n<tr>\n<th>Position<\/th>\n<th>I.E.<\/th>\n<th>Firefox<\/th>\n<th>Opera<\/th>\n<th>Chrome<\/th>\n<th>Safari<\/th>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>199988<\/td>\n<td>200754<\/td>\n<td>199944<\/td>\n<td>199431<\/td>\n<td>199883<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>200320<\/td>\n<td>200016<\/td>\n<td>199838<\/td>\n<td>199752<\/td>\n<td>200074<\/td>\n<\/tr>\n<tr>\n<td>3<\/td>\n<td>199702<\/td>\n<td>199680<\/td>\n<td>199911<\/td>\n<td>200865<\/td>\n<td>199842<\/td>\n<\/tr>\n<tr>\n<td>4<\/td>\n<td>200408<\/td>\n<td>200286<\/td>\n<td>199740<\/td>\n<td>199861<\/td>\n<td>199705<\/td>\n<\/tr>\n<tr>\n<td>5<\/td>\n<td>199582<\/td>\n<td>199264<\/td>\n<td>200567<\/td>\n<td>200091<\/td>\n<td>200496<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h4>Fraction of total<\/h4>\n<table border=\"1\">\n<tbody>\n<tr>\n<th>Position<\/th>\n<th>I.E.<\/th>\n<th>Firefox<\/th>\n<th>Opera<\/th>\n<th>Chrome<\/th>\n<th>Safari<\/th>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>0.2000<\/td>\n<td>0.2008<\/td>\n<td>0.1999<\/td>\n<td>0.1994<\/td>\n<td>0.1999<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>0.2003<\/td>\n<td>0.2000<\/td>\n<td>0.1998<\/td>\n<td>0.1998<\/td>\n<td>0.2001<\/td>\n<\/tr>\n<tr>\n<td>3<\/td>\n<td>0.1997<\/td>\n<td>0.1997<\/td>\n<td>0.1999<\/td>\n<td>0.2009<\/td>\n<td>0.1998<\/td>\n<\/tr>\n<tr>\n<td>4<\/td>\n<td>0.2004<\/td>\n<td>0.2003<\/td>\n<td>0.1997<\/td>\n<td>0.1999<\/td>\n<td>0.1997<\/td>\n<\/tr>\n<tr>\n<td>5<\/td>\n<td>0.1996<\/td>\n<td>0.1993<\/td>\n<td>0.2006<\/td>\n<td>0.2001<\/td>\n<td>0.2005<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>And the results of the Chi-square test:<\/p>\n<pre>X-squared = 18.9593, df = 16, p-value = 0.2708<\/pre>\n<h3>Final thoughts<\/h3>\n<p>In the end I don&#8217;t think it is reasonable to expect every programmer to memorize the Fisher-Yates algorithm. These things belong in our standard libraries. \u00a0 But what I would expect every programmer to know is:<\/p>\n<ol>\n<li>That the problem here is one that requires a &#8220;random shuffle&#8221;.  If you don&#8217;t know what it is called, then it will be difficult to lookup the known approaches.  So this is partially a vocabulary problem. We, as programmers, have a shared vocabulary which we use to describe data structures and algorithms; binary searches, priority heaps, tries, and dozens of other concepts.  I don&#8217;t blame anyone for not memorizing algorithms, but I would expect a programmer to know what types of algorithms apply to their work.<\/li>\n<li>How to research which algorithm to use in a specific context, including where to find reliable information, how to evaluate the classic trade-offs of time and space, etc.\u00a0 There is almost always more than one way to solve a problem.<\/li>\n<li>That where randomized outputs are needed,\u00a0 the outputs should be statistically tested.  I would not expect the average programmer to know how to do a chi-square test, or even to know what one is.  But I would expect a mature programmer to know either find this out or seek help.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>A quick update on my post from last week on the &#8220;Microsoft Shuffle&#8220;, where I looked at how Microsoft&#8217;s &#8220;random&#8221; browser ballot was far from random. First, I&#8217;d like to thanks those who commented on that post, or sent me notes, offering additional analysis. I think we nailed this one. Within a few days of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[35],"tags":[],"class_list":{"0":"post-813","1":"post","2":"type-post","3":"status-publish","4":"format-standard","6":"category-microsoft","7":"entry"},"_links":{"self":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts\/813","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/comments?post=813"}],"version-history":[{"count":20,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts\/813\/revisions"}],"predecessor-version":[{"id":821,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/posts\/813\/revisions\/821"}],"wp:attachment":[{"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/media?parent=813"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/categories?post=813"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.robweir.com\/blog\/wp-json\/wp\/v2\/tags?post=813"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}