There are a few things that a spreadsheet can get wrong. You can be hard to use. You can be ugly. You can be slow. You can even crash occasionally. These are all annoyances. With a monopoly, annoyances are a way of life. However, there is one place where a spreadsheet can never be wrong, and that is in the area of calculations, especially financial calculations. A business uses spreadsheets to model and track its business. It makes decisions based on these spreadsheets. If the spreadsheet gives the wrong answer, then the business makes the wrong decision. And a business that makes the wrong decisions does not stay in business for long.
So it was with much lament last year that I reported that OOXML, then under ballot in ISO/IEC JTC, had many egregious errors in its spreadsheet formula definitions. In addition to enumerating these errors in my blog, I submitted them for consideration by INCITS V1, the US SC34 mirror committee, and these became part of the bundle of comments with which the US accompanied its ballot.
Although the ballot wasn’t due until September 2nd, the extravagant “sit and do nothing” provisions in INCITS lead to our technical review being cut off in July. Because of the lack of review time, I was not able to do an in-depth review of all the spreadsheet functions, but only a cursory review. But the existence of such errors as I did identify, in the already-approved Ecma Standard, was disquieting. It should have lead Ecma TC45 to conduct a more through review of the spreadsheet functions. But this did not appear to happen.
When we received the response from Ecma to the NB comments, in January, INCITS V1 was asked to go through all the responses (over 1,000 of them) to determine whether they were acceptable. This review period was again insufficient.
As for the BRM in February, this was a travesty, as I and others have noted.
I have a theory concerning committees. A committee may have different states, like water has gas, liquid or solid phases, depending on temperate and pressure. The same committee, depending on external circumstances of time and pressure will enter well-defined states that determine its effectiveness. If a committee works in a deliberate mode, where issues are freely discussed, objections heard, and consensus is sought, then the committee will make slow progress, but the decisions of the committee will collectively be smarter than its smartest member. However, if a committee refuses to deliberate and instead merely votes on things without discussion, then it will be as dumb as its dumbest members. Voting dulls the edge of expertise. But discussion among experts socializes that expertise. This should be obvious. If you put a bunch of smart people in a room and don’t let them think or talk, then don’t expect smart things to happen as if the mere exhalation of their breath brings forth improvements to the standard.
So the the BRM ended a different committee than it started, and the mode of operation it was lead into caused it to act like a very stupid committee indeed. I don’t say this to be accusatory. I’m just making an observation about crowd behavior. When a committee of experts ceases to be a deliberate committee, then you will achieve subpar results.
One of the ways the BRM was stupid is that it approved changes to OOXML that have totally broken SpreadsheetML’s financial calculations, rendering the resulting calculations both mathematically incorrect as well as inconsistent with what Excel actually calculates. More about this later.
Here be dragons.
One of the persistent problems with OOXML has been in the area of “day count conventions” as used in SpreadsheetML.
Why should day counting be complicated? Just count how many days between the two dates and you are done, right? Indeed, everywhere but in finance it is simple. Some of the complications are for historical reasons, to try to make hand calculations easier in the pre-computer era. Also, these conventions made the calendar more regular, so financial instruments were less distorted by calendar irregularities like leap years or variable length months. If you assume a year is exactly 360 days in 12 months with 30 days each, then some things in life are simpler. Of course, this makes other things in life more complicated, including defining spreadsheet functions.
The devil is in the details. You must get these count conventions right. Being wrong by just one day in a year may only be 0.3%, but in a million dollar transaction, that is $3000. Not many people can afford to routinely ignore a $3000 error.
Different financial organizations have developed their own different day count conventions. We have the Banker’s Rule, the LIBOR rule, the NASD rule, the ISDA rule, ICMA rule, SIFMA rule, etc.
In Excel, these conventions are defined by a function parameter called the “basis” and this basis is used by many financial spreadsheet functions. In fact, none of the following functions are adequately defined unless the day count basis values are adequately defined:
There are five basis conventions defined in OOXML, with values 0-4:
- Basis 0 = US 30/360
- Basis 1 = Actual/Actual
- Basis 2 = Actual/360
- Basic 3 = Actual/365
- Basic 4 = European 30/360
In this convention X/Y indicates that X is the assumption for month length, and Y is the assumption for year length. Note that there is no external authority given as a reference for any of these conventions. So it is not clear, whether the Actual/Actual convention is following, for example ICMA rules or ISDA rules. In fact, it may be following neither rule. So lacking any external authority, all we have to go on are the definitions provided in the specification itself. OOXML is its own authority for day count conventions.
As I reported last July, the definitions provided by OOXML did not sufficiently define the behavior of the conventions that lie behind most of the financial functions in Excel. Unfortunately, Microsoft/Ecma have failed to fix this problem in their proposed Resolutions, and in fact have made it worse. Further, the DIS 29500 BRM, in its negligent bulk approval of Ecma’s responses merely advanced these serious errors into the text which was then approved as an International Standard. So essentially, my work in pointing out errors in the spreadsheet language was for naught. Microsoft just shoved it all though anyways.
This puts me in a delicate situation. On the one hand, the ODF TC really would like to finish its work on ODF 1.2, and part of that is completing the OpenFormula work. A key remaining part of OpenFormula is to ensure that our financial functions synch up with how Excel works. What OOXML says is irrelevant, except to the extent that it accurately tells how Excel defines these functions.
However, inquiries to Ecma on these day count conventions, inqueries made months ago, have received no response. Also, the final DIS text of OOXML has not been made available, not even to SC34 members and liaisons. And there is not mechanism in place yet in SC34 for collecting defect reports on OOXML. So we’re stuck.
Or maybe not.
David Wheeler, Chair of the OpenFormula subcomittee of the OASIS ODF TC, has been trying to nail down the behavior of Excel’s spreadsheet functions for over a year now. One of the last remaining pieces is to nail down the day count conventions. After waiting and waiting for this to be clarified in OOXML, David took matters into his own hands and decided to solve the problem by brute force, enumerating millions of test cases, indeed a comprehensive set of date pairs over a 6 year period, to try to determine exactly how the date bases in Excel work. You can read David’s conclusions on his blog.
What strikes me in David’s report is that not only are the OOXML definitions incomplete and inconsistent, but they do not accurately reflect what Excel actually calculates. So either Excel is wrong, or the OOXML standard is wrong when calculating almost every financial spreadsheet function. This is quite an embarrassment for an ISO standard, and an unnecessary one, since I have been talking about how poorly defined these functions are for almost a year now.
Let’s work through some of David Wheeler’s test cases by hand, to get a better feel for how OOXML is broken.
Let’s take the YEARFRAC() function as the simplest example. YEARFRAC() takes two dates and a basis value as inputs, and returns the fraction of a year that is between those two dates. So logically, the calculation is like:
YEARFRAC = (interval between start-date and end-date) / length of year.
That is the logical definition. The only complication is the 5 date bases, and what exactly the year length is. This last point is something that stuck out in my mind when I first reviewed the draft of OOXML last summer. You might think the year length is 365 days. But what about leap years? And what about date ranges that straddle normal years and leap years? This is the key fact that this function requires in is definition. This is the problem I reported on my blog last July. This was the problem INCITS V1 submitted with our ballot comments last September. This is the problem that Ecma responded to in a severely flawed way in January. This is the problem that the BRM refused to discuss and merely agreed with Ecma’s flawed changes. And this is the problem that is now in the final DIS version of OOXML.
Take a look for yourself in this brief extract from the final DIS version [pdf] of OOXML, provided for purposes of critical review and commentary.
(Yes, I now have a complete copy of the final DIS version of OOXML. If you think that this is unfair — and I would agree with you on that — then maybe you should ask ITTF why I was able to get a copy of the final DIS, but no one else in SC34 was.)
Let’s take a look at the ISO OOXML definitions and try some test calculations to reproduce some of David’s findings.
First, let’s take the date basis 1 first, Actual/Actual, since that is the easiest. But we immediately run into a problem. The standard says two different things. In the “description” table it says:
Actual/actual. The actual number of days between the two dates are counted. If the date range includes the date 29 February, the year is 366 days; otherwise it is 365 days.
However, later, when defining the return value, the standard says this:
If the Actual/actual basis is used, the year length used is the average length of the years that the range crosses, regardless of where start-date and end-date fall in their respective years.
There is absolutely no way in which these can both be correct. This would have been easily fixable at the BRM, if the BRM had been allowed to do its job. But I wasn’t even allowed to open my mouth and point out this problem. So now this fatal ambiguity sits in the text of OOXML, as authorized by the BRM experts and approved by JTC1 NB’s. Gotta love it.
But let’s forge on and assume that we really have two algorithms, the first (from the description) which we will call basis 1, and the second basis (from the return value section) which we will call 1′ (“one-prime”). We’ll calculate it both ways.
Let calculate YEARFRAC(DATE(2000,1,1),DATE(2001,1,1),1)
With basis 1, this is simple. 2000 was a leap year (since it was a century divisible by 400) so the interval between the two dates is 366 days. Similarly, since the date range includes a February 29th, the length of the year used in the calculation is 366. So the returned value returned by YEARFRAC should be 366/366 or 1.0.
With basis 1′, this is also simple. The date interval is still 366. But the length of the year is now the average of the year lengths crossed by the date range. So the average of 366 (for 2000) and 365 (for 2001), or an average of 365.5. So using basis 1′, YEARFRAC should return 366/365.5 or 1.00137.
Let’s try another, with basis 1. What is YEARFRAC(DATE(2000,1,1),DATE(2002,1,1),1) ?
In basis 1, the interval is 731 days (366 days in 2000 plus 365 days in 2001). The year length is 366, since the interval includes a February 29th. So YEARFRAC should return 731/366 = 1.997.
With basis 1′, the interval is also 731 days, but the year length is the average of 366 (for 2000), 365 (for 2001) and 365 (for 20002). So YEARFRAC should return 2.0009
Just to make sure we have this down, let’s try another example:
What is YEARFRAC(DATE(2000,1,1),DATE(2000,1,2),1) ?
A one day interval? Yes, please humor me.
OK. With basis 1, the interval is one day. Since it does not cross February 29th, the year length is 365 days. So it will return 1/365 or 0.0028.
With basis 1′ the interval is also one day, and the year length is 366, So it will return 1/366 or 0.0027
OK. That was too easy. One more example, to make sure that you have it down.
What is YEARFRAC(DATE(2000,1,1),DATE(2004,1,31),1) ?
Hmm…. this one will require more thought. I might have to take off my shoes and count using my toes as well.
With basis 1, the interval is 1491 days = 366 + 365 + 365 +365 +30.
The year lengths are:
2000=366 (since the range cross February 29th)
2004=365 (since the date range does not cross February 29th)
So we have 4 full years plus 30 days of a 365 day year. YEARFRAC should return 4 + 30/365 = 4.0822.
With basis 1′ we treat the 2004 as having 366 days and average the years in the interval, so average year length = (366+365+365+365+366)/5 = 365.4. So YEARFRAC should return 4 + 30/365.4 = 4.0821.
Now that we’re done with the examples, we can throw them into a table, and compare them to what Excel 2007 calculates for these same parameters:
|start-date||end-date||basis||ISO Value||Excel’s Value||Excel Correct?|
Let your mind linger on these results for a bit. Let it sink in. Look at this table until you recognize its significance and cringe in disgust. In some cases, Excel seems to be using the first definition of date basis 1. In other cases it is using the second definition of date basis 1. And in one case it is using neither definition. In other words, this is a lot more screwed up than at first it appears. This is not just a simple ambiguity. The OOXML definition of date basis 1 is totally wrong.
That is just the Actual/Actual date basis. The other 4 conventions are more complicated. I encourage you to read David’s write up in full to see how 3 of 5 basis conventions defined in OOXML differ from what Excel actually calculates. David also shows how he believes Excel really calculates these day count conventions, based on his extensive tests.
Now if this was just a matter of one function in Excel, just YEARFRAC(), then this would not be a big deal. But this is flaw is inherited into most financial functions in OOXML. Let’s take an example at random, DISC(), which calculates the discount rate for a security, given settlement and maturity dates, as well as par and redemption values. You can read the definition of DISC() from the final DIS text here [pdf].
You don’t need to be an Wall Street quantitative analyst to see some obvious problems here. First, in the formula given, the 2nd term is divided by “SM”. What is SM? There is no “SM” defined here. There is a “DSM” defined, however. Is that what is meant? Let’s assume so.
We can try a test calculation, using the example given in the text of OOXML:
What is the value of: DISC(DATE(2007,1,25),DATE(2007,6,15),97.975,100,1) ?
- B = 365, since the date range does not include February 29th. (Note that DISC has a single definition for year length in basis 1, not the two conflicting definitions we saw in YEARFRAC)
- DSM = 141 days = 6 in Jan+28 in Feb+31 in March+30 in April+31 in May+15 in June
- par = 97.975, which was our input parameter
- redemption = 100, which is another one of our input parameters
You can do the calculation given in the text, and find the return value of DISC should be 5.35037%. Plug the example into Excel and you will get the answer 5.24202%.
So Excel is off by 2% or so. Do we really care. It’s just money, right?
The problem is that the function in OOXML is defined incorrectly, from the financial perspective. The discount rate is the discount from the redemption value, not the discount from the purchase price. So the first term in the formula should be (redemption-par)/redemption, not (redemption-par)/par. If you make this change, then the calculated value matches the value Excel gives.
Does anything strike you as odd in the above? Do you have a chill running down your spine? Do you have renewed sense of dread? You should, because I just illustrated another grave problem with the OOXML standard: The spreadsheet examples in OOXML are a fraud.
You might have mistakenly been reassured by these numerous examples in the spreadsheet formula, that these actually had some relationship to the standard, that they were examples of how the calculations should be done, that they were evidence of some sort of quality assurance, that they may even be of assistance to an implementor to see whether they implemented the function correctly.
But they aren’t.
What would be normal practice would be to take the definitions, as given in the OOXML text, and to calculate the values according to the definition provided in the text, and then to compare the resulting values with what Excel returns. That would provide a useful double-check of the definitions in the text. But OOXML doesn’t do that. The examples here are mere fluff.
The discrepancy here also indicates that no one has actually reviewed these formulas for accuracy. Errors like this are immediately evident, but only if you look. The fact that things like this have escaped the notice of Microsoft, Ecma TC45, their Project Editors, 80 NB reviews, the BRM experts, and the eagle eyes of ITTF, should make one have considerable concerns over the the sufficiency of the Fast Track review and approval process.
Let’s try one more example before we wrap this up.
What is the value of: DISC(DATE(2000,1,1),DATE(2004,1,31),97.975,100,1) ?
- B = 366, since the date range includes February 29th
- DSM = 1491 days = 366 + 365 + 365 +365 +30.
- par = 97.975, which was our input parameter
- redemption = 100, which is another one of our input parameters
So, using the stated formula in the OOXML text, the answer would be 0.50736%. Using the “correct” formula, in the financial sense, the answer would be 0.49708%. What does Excel say the value is? Well, 0.49627%, of course.
So with the DISC() function we found:
- The given text provided a formula that referred to a non-existant “SM” variable. This appears to be a cut & paste error.
- After accounting for that, we found that the formula was incorrect according to recognized financial standards. Securities are discounted from their redemption values, not from their purchase prices.
- Even correcting for the formula errors, we find that the given definition of DISC() does not match what Excel returns, due to errors and ambiguities in the day count conventions, errors that David Wheeler delves into more deeply in his report.
- The examples given in the standard are bogus. They are not actually examples of the defined function.
- Excel does not implement OOXML.