1/23/2007 — A translation of this post, in Spanish has been provided by a reader. You can find it in the Los Trylobytes blog.
I’ve also taken this opportunity to update page and section references to refer to the final approved version of the Ecma Office Open XML specification, as well as providing a link to the final specification.
Early civilizations tried to rationalize the motions of the heavenly bodies. The sun rises and sets and they called that length of time a “day”. The moon changes phases and they called a complete cycle a “month”. And the sun moves through the signs of the zodiac and they called that a “year”. Unfortunately, these various lengths of time are not nice integral multiples of each other. A lunar month is not exactly 30 days. A solar year is not exactly 12 lunar months.
To work around these problems, civil calendars were introduced — some of the world’s first international standards — to provide a common understanding of date reckoning, without which commerce, justice and science would remain stunted.
In 45 B.C., Julius Caesar directed that an extra day be added to February every four years. (Interestingly, this extra day was not a February 29th as we have today in leap years, but by making February 24th last for two days.) This Julian System was in use for a long time, though even it has slight errors. By having a leap year every four years, we had 100 leap years every 400 years. However, to keep the seasons aligned properly with church feasts, etc., (who wants to celebrate Easter in Winter?) it was necessary to have only 97 leap years every 400 years.
So, in 1582 Pope Gregory XIII promulgated a new way of calculating leap years, saying that years divisible by 100 would be leap years only if they were also evenly divisible by 400. So, the year 1600 and 2000 were leap years, but 1700, 1800 and 1900 were not leap years. This Gregorian calendar was initial adopted by Catholic nations like Spain, Italy, France, etc. Protestant nations pretty much had adopted it by 1752, and Orthodox countries later, Russia after their 1918 revolution and Greece in 1923.
So, for most of the world, the Gregorian calendar has been the law for 250-425 years. That’s a well-established standard by anyone’s definition. Who would possibly ignore it or get it wrong at this point?
If you guessed “Microsoft”, you may advance to the head of the class.
Datetimes in Excel are represented as date serial numbers, where dates are counted from an origin, sometimes called an epoch, of January 1st, 1900. The problem is that from the earliest implementations Excel got it wrong. It thinks that 1900 was a leap year, when clearly it isn’t, under Gregorian rules since it is not divisible by 400. This error causes functions like the WEEKDAY() spreadsheet function to return incorrect values in some cases. See the Microsoft support article on this issue.
Now I have no problems with that bug remaining in Excel for backwards compatibility reasons. That’s an issue between Microsoft and their customers and not my concern. However, I am quite distressed to see this bug promoted into a requirement in the Ecma Office Open XML (OOXML) specification. From Section 3.17.41 of SpreadsheetML Reference Material, page 3305 of the OOXML specification (warning 49MB PDF download!) , “Date Representation”:
For legacy reasons, an implementation using the 1900 date base system shall treat 1900 as though it was a leap year. [Note: That is, serial value 59 corresponds to February 28, and serial value 61 corresponds to March 1, the next day, allowing the (nonexistent) date February 29 to have the serial value 60. end note] A consequence of this is that for dates between January 1 and February 28, WEEKDAY shall return a value for the day immediately prior to the correct day, so that the (nonexistent) date February 29 has a day-of-the-week that immediately follows that of February 28, and immediately precedes that of March 1.
So the new OOXML standard now contradicts 400 years of civil calendar practice, encodes nonexistent dates and returns the incorrect value for WEEKDAY()? And this is the mandated normative behavior? Is this some sort of joke?
The “legacy reasons” argument is entirely bogus. Microsoft could have easily have defined the XML format to require correct dates and managed the compatibility issues when loading/saving files in Excel. A file format is not required to be identical to an application’s internal representation.
Here is how I would have done it. Define the OOXML specification to encode dates using serial numbers that respect the Gregorian leap year calculations used by 100% of the nations on the planet. Then, if Microsoft desires to maintain this bug in their product, then have Excel add 1 to every date serial number of 60 or greater when loading, and subtract 1 from every such date when saving an OOXML file. This is not rocket science. In any case, don’t mandate the bug for every other processor of OOXML. And certainly don’t require that every person who wants the correct day of the week in 1900 to perform an extra calculation.
Sure this requires extra code to be added to Excel. Excel has a bug. Of course it will require code to fix a bug. Deal with it. I think the alternative of forcing the rest of the world to a adopt a new calendar system is the ultimate in chutzpah. The burden of a bug should fall on the product that has the bug, not with everyone else in the world.
Further, I’d note that section 3.2.28 (page 2693) defines a workbookPr (Workbook Properties) element with several attributes including the following flag:
date1904 (Date 1904)
Specifies a boolean value that indicates whether the date systems used in the workbook starts in 1904.
A value of on, 1, or true indicates the date system starts in 1904.
A value of off, 0, or false indicates the workbook uses the 1900 date system, where 1/1/1900 is the first day in the system.
The default value for this attribute is false.
What is so special about 1904 you might ask? This is another legacy problem with Excel, that implementations of Excel on the Mac, for reasons unknown to me, had an internal date origin of January 1st, 1904 rather than January 1st, 1900. This is unfortunate for Microsoft’s Mac Business Unit, and has likely been a source of frustration for them, needing to maintain these two date origins in their internal code.
But why is this my problem? Why should a standard XML format care about what Excel does on the Mac? Why should it care about any vendor’s quirks? If RobOffice (a fictional example) wants to internally use a date origin of March 15th, 1903 then that is my business. In my implementation I can do whatever I want. But when it comes to writing a file format standard, then the caprices of my implementation should not become a requirement for all other users of the file format. Further, if I cannot make up my mind and choose a single date origin then my indecisions should not cause other implementations to require extra code because of my indecision.
So there you have it, two ways in which Microsoft has created a needlessly complicated file format, and made your life more difficult if you are trying to work with this format, all to the exclusive advantage of their implementation. I wish I could assure you that this is an isolated example of this approach in OOXML But sadly, it is the rule, not the exception.