How many defects remain in OOXML?

2008/03/18 By Rob 54 Comments

DIS 29500, Office Open XML, was submitted for Fast Track review by Ecma as 6,045 page specification. (After the BRM, it is now longer, maybe 7,500 pages or so. We don’t know for sure, since the post-BRM text is not yet available for inspection.) Based on the original 6,045 page length, a 5-month review by JTC1 NB’s lead to 48 defect reports by NB’s, reporting a total of 3,522 defects. Ecma responded to these defect reports with 1,027 proposals, which the recent BRM, mainly through the actions of one big overnight ballot, approved.

So what was the initial quality of OOXML, coming into JTC1? One measure is the defect density, which we can say is at least one defect for every 6045/1027 = 5.8 pages. I say “at least” because this is the lower bounds. If we believed that the 5-month review represented a complete review of the text of DIS 29500, by those with relevant subject matter expertise, then we would have some confidence that all, or at least most, defects were detected, reported and repaired. But I don’t know anyone who really thinks the 5-month review was sufficient for a technical review of 6,045 pages. Further, we know that Microsoft worked actively to suppress the reporting of defects by NB’s. So the actual defect density is potentially quite a bit higher than the reported defect density.

But how much higher? This is the important question. It doesn’t matter how many defects were fixed. What matters is how many remain.

There are several approaches to answering this question. One approach is to look at defect “find rates”, the number of defects found per unit of time spent reviewing, and fit that to a model, typical an S-curve (sigmoid) and use that model to predict the number of defects remaining. However, we have no time/effort data for the DIS 29500 review, so we don’t have enough data to create that model. Another approach is to randomly sample the post-BRM text and statistically estimate the defect density by this sample.

Are there any other good approaches?

Here is the plan. I will use the second approach. Since I do not actually have the post-BRM text, I need to make some adjustments. I’ll start with the original text, in particular Part 4, the XML reference section, at 5,220 pages, where the meat of the standard is. I’ll then create a spreadsheet and generate 200 random page numbers between 1 and 5,220. For each random page I will review the clause associated with that page and note the technical and editorial errors I find. I will then check these errors to see if any of them were addressed by BRM resolutions.

Based on the above, I will be able to estimate two numbers:

The defect density of the text, both pre and post BRM
The fraction of defects which were detected by the Fast Track review.

So if I find N defects, and 0.9N of those issues were already found during the Fast Track review and were addressed by the BRM, then we can say that the Fast Track procedure was 90% effective in finding and removing errors. Some practitioners would call that the defect removal “yield” of the process. But if we find that only 0.1N of the errors were reported and addressed by the BRM, then we’ll have a different opinion on the sufficiency of the Fast Track review.

Clear enough? Microsoft is claiming something like 99% of all issues were resolved at the BRM. So let’s see if we get anything close.

I’m not done with this study yet. I’m finding so many defects that recording them is taking more time than finding them. But since this is topical, I will report what I have found so far, based on the first 25 random pages, or 1/8th completion of my target 200. I’ve found 64 technical flaws. None of the 64 flaws were addressed by the BRM. Among the defects are some rather serious ones such as:

storage of plain text passwords in database connection strings
Undefined mappings between CSS and DrawingML
Errors in XML Schema definitions
Dependencies of proprietary Microsoft Internet Explorer features
Spreadsheet functions that break with non-Latin characters
Dependencies on Microsoft OLE method calls
Numerous undefined terms and features

As I said, this study is still underway. I’ll list the defects I’ve found so far, and add to it as I complete the task over the next few days.

Page 692, Section 2.7.3.13 — no errors found
Page 1457, Section 2.15.3.45 — This is a compatibility setting which creates needless complexity for implementers who now must deal with two different ways of handling a page break, one in which a page break ends the current paragraph, and another where it does not. This is not a general need and expresses only a single vendor’s legacy setting.
Page 490, Section 2.4.72 — This defines the ST_TblWidth type, used to express the width of a table column, cell spacing, margins, etc. The allowed values of this type express the measurement units to be used: Auto, Twentieths of a point, Nil (no width), Fiftieths of a percent. I find these choices to be capricious and not based on any sound engineering principle. It also mixes units with width values (Nil) and modes (auto). This should be changed to allow measurements in natural units, such as allowed in XSL-FO or CSS2, such as mm, inches, points, pica. Also, do not mix units, values and modes in the same attribute. Nil is best represented by the value 0 and Auto should be its own Boolean attribute.
Page 328, Section 2.4.17 — The frame attribute description says it “Specifies whether the specified border should be modified to create a frame effect by reversing the border’s appearance from the edge nearest the text to the edge furthest from the text.” This is not clear. What does it mean to reverse a border’s appearance? Are we doing color inversions? Flipping along the Y-axis? What exactly? Also a typographical error: “For the right and top borders, this is accomplished by moving the order down and to the right of its original location.” Should be “moving the border down…” Also, it is not stated how far the border should be moved.
Page 1073, Section 2.14.8 — This feature is described as: “This element specifies the connection string used to reconnect to an external data source. The string within this element’s val attribute shall contain the connection string that the hosting application shall pass to a external data source access application to enable the WordprocessingML document to be reconnected to the specified external data source.” Since connection to external data typically requires a user ID and a password, the lack of any security mechanism on this feature is alarming. The example given in the text itself hardcodes a plain-text password in it the connection string.
Page 4387, Section 6.1.2.3 — For the “class” attribute it says “Specifies a reference to the definition of a CSS style.” The example implies that some sort of mapping will occur between CSS attributes and DrawingML. But no such mapping is defined in OOXML. The “doubleclicknotify” attribute implies some sort of event model that us undefined in OOXML. How do you send a message for doubleclicknotify? Why do we describe organization chart layouts here when it is not applicable to a bezier curve? What happens if this shape is declared to be a horizontal rule or bullet or ole object? The text allows you label it as one of these, but assigns no meaning or behavior to this. Why do we have an spid as well as an id attribute? The “target” attribute refers to Microsoft-specific I.E. features such as “_media”. Although the text says that control points have default values, the schema fragment does not show this.
Page 3164, Section 4.6.88 — This and the following two elements are all called “To” but this seems to be a naming error. 4.6.89 is essentially undefined. What does “The element specifies the certain attribute of a time node after an animation effect” mean? It doesn’t seem to really signify anything. Ditto for 4.6.90.
Page 5098, Section 7.1.2.124 — The example does not illustrate what the text claims it does. The example doesn’t even use the element defined by this clause.
Page 4492, Section 6.1.2.11 — The “althref” attribute is described as “Defines an alternate reference for an image in Macintosh PICT format”. Why is this necessary for only Mac PICT files? Why would “bilevel” necessarily lead to 8 colors? We’re well beyond 8-bit color these days. “blacklevel” attribute is defined as “Specifies the image brightness. Default is 0.” What is the scale here? This needs to be defined. Is it 0-1.0, 0-255 or what? And what is “image brightness” in terms of the art? Is this luminosity? Opacity? Is this setting the level of the black point? For “cropleft”, etc. — what units are allowed? (implies %) How does “detectmouseclick” work when no event model is defined? “emboss effect” is not defined. “gain” has the same problem as “blacklevel” — no scale is defined. This element has two different id attributes in two different namespaces, with two different types. “movie” attribute is described as “Specifies a pointer to a movie image. This is a data block that contains a pointer to a pointer to movie data”. Excuse me? “A pointer to a pointer to movie data”? This is useless. The “recolortarget” example appears to contradict the description. It shows shows blue recolored to red, not black. The “src” attribute is said to be a URL, yet is typed to xsd:string. This should be xsd:anyURI.
Page 1431, Section 2.15.3.30 — no errors noted
Page 3405, Section 5.1.5.2.7 — The conflict resolution algorithm should be normative, not merely in a note.
Page 875, Section 2.11.21 — Instead of saying that the footnote “pos” element should be ignored if present at the section level, the schema should be defined so as to not allow it at the section level. In other words, this should be expressed as a syntax constraint.
Page 1955, Section 3.3.1.20 — This facility for adding “arbitrary” binary data to spreadsheets is said to be for “legacy third-party document components”. No documentation or mapping for such legacy components has been provided, so interoperability with this legacy data cannot be achieved. Why isn’t this expressed using the extension mechanisms of Part 5 of the DIS?
Page 4526, Section 6.1.2.13 — The “allowoverlap” attribute is not sufficiently defined. In particular, what determines whether the object shifts to right or left? ST_BWMode is not adequately defined. For example, one option is “Use light shades of gray only”. How light? And what is the difference between “hide” and “undrawn”? Also, concept of “wrapping polygon” is not sufficiently defined. For example, what is the wrapping polygon for an oval? The purpose of “dgmlayoutmru” is obscure. Wouldn’t the most-recently-used layout option be the one which is actually in use, “dgmlayout”? The “dgmnodekind” attribute is undefined, said to be “application-specific”. Is interoperabilty not allowed? The text seems to imply that applications must use application-specific values. The “href” attribute is give a string schema type. Shouldn’t this be xsd:anyURI. The “id” attribute is said to be a “unique identifier”. Unique in what domain? Among shapes of this type? Among all shapes? All shapes on this page? Among all ID’s in the document? The “preferrelative” attribute is not sufficiently defined. Where is the original size stored? After what reformatting? This appears to be a specification for runtime behavior, not a storage artifact. But it is not clear what is required. For the “regroupid”, where is the list of these possible id’s? The Hyperlink targets _media and _search are Internet Explorer proprietary features.
Page 1193, Section 2.15.1.39 — no errors noted
Page 1459, Section 2.15.3.46 — no errors noted
Page 2671, Section 3.17.7.150 — no errors noted
Page 2347, Section 3.10.1.69 — An “AutoShow” filter is not defined in this standard, though it is called for in several places of this section. “Average” aggregation function is not defined. In fact, none of these aggregation functions are defined. Although some have common mathematical definitions, in a spreadsheet context it is critical to make an explicit statement on treatment of strings, blanks, empty cells, etc. For dataSourceSort, what type of sort is required? Lexical or locale-sensitive? This element seems to mix field-specific settings, like dragToCol with pivotTable-wide settings like hiddenLevel. This will result in large data redundancy as settings like hiddenLevel are stored multiple times, once for each pivotField. “Inclusive Mode” is not defined. “Measure based filter” is not defined. “AutoSort” mode is not defined. The resolution of pivot table versus cell styles is ambiguous. “If the two formats differ, the cell-level formatting takes precedence.” Is this negotiation done at the level of the entire text style? Style ID? Or at the attribute level? “Outline form” is not defined. “server-based page field” is not defined. (what is a page field?) “member caption” is undefined.
Page 2885, Section 3.18.51 — The values of the given type (ST_OleUpdate) are explicitly tied to the Microsoft Windows OLE2 technology via the two method calls IOleObject::Update or IOleLink::Update
Page 3951, Section 5.5.3.4 — The base values “margin” and “edge” are ambiguous. Is it specifying positioning from the left or right page edge?
Page 2710, Section 3.17.7.200 — The description of “lookup-vector” is insufficient. It seems to be saying that the range should be sorted. Is this really correct? Spreadsheet functions typically do not have side effects. Also, the sorting procedure is explicitly defined only defined for the Latin alphabet. What about the rest of allowed Unicode characters, including the C0 control characters which are allowed in SpreadsheetML cell contents? Where are they sorted?
Page 934, Section 2.13.5.5 — The “id” attribute is required to be unique, but it is not specified over what domain it must be unique.
Page 607, Section 2.6.2 — What does “reversing the borders’s appearance mean”? How much offset is required for a shadow?
Page 201, Section 2.3.2.19 — This feature allows the suppressing of both spell and grammar checking for a text run. These should be two different settings, one for spelling and one for grammar proofing. There are many cases where it is important to check one, but not the other, just as in content comprised of sentence fragments, which are not grammatically complete, but where correct spelling is desired.
Page 1240, Section 2.15.1.74 — This setting specifies that the document should be saved into an undefined invalid XML format. But it is not stated how an XSLT transfor can be applied to an OOXML document, since OOXML is a Zip file containing many XML documents. So what exactly is the specified XSLT applied to?

That’s as far as I’ve gone. But this doesn’t look good, does it? Not only am I finding numerous errors, these errors appear to be new ones, ones not detected by the NB 5-month review, and as such were not addressed in Geneva. Since I have not come across any error that actually was fixed at the BRM, the current estimate of the defect removal effectiveness of the Fast Track process is < 1/64 or 1.5%. That is the upper bounds. (Confidence interval? I’ll need to check on this, but I’m thinking this would be based on standard error of a proportion, where SE=sqrt((p*(1-p))/N)), making our confidence interval 1.5% ± 3%) Of course, this value will need to be adjusted as my study continues. However, it is starting to look like the Fast Track review was very shallow and that detected only a small percentage of the errors in the DIS.

[20 March Update]

As one commenter noted, the page numbers I’m using above are PDF page numbers, not the page numbers on bottom of each page. If I used the printed pages then I would need to deal with all the Roman numeral front matter pages as an exception. Simpler to just use the one large domain of PDF page numbers.

PDF Page Number = Printed Page Number + 7

I will continue to report new defects, according to the original random number list I generated. I’ll update the statistics every 25.

Here’s some more for today:

Page 4192, Section 5.8.2.20 — “fPublished” attribute is defined as “Specifies whether the shape shall be published with the worksheet when sent to the spreadsheet server. This is for use when interfacing with a document server.” What worksheet? This section is in the DrawingML reference material. Charts could appear in presentations as well. This should not be limited to worksheets. Also what is a “spreadsheet server”? No such technology has been defined in this standard. Also no protocol has been defined for publishing to a spreadsheet server. Is this some proprietary hook for SharePoint? The “macro” attribute allows the storage of application-defined scripts. We are told that the macro “should be ignored if not understood.” However there is no mechanism for determining what language the script is in. How do we know if we understand the macro? Content sniffing? Attempt to execute it and see if we get a runtime error? But by that time, once we find out that we do not understand it, it is too late to ignore the macro. We may have already triggered runtime side effects. What we really need here is some way to declare what scripting language is being used, via a namespace or an additional attribute like “lang”.
Page 3526, Section 5.1.5.4.21 — The “algn” attribute specifies the text alignment. Allowed values include left, right, center, justified, etc. However, what is lacking is “start” and “end” alignment, which are sensitive to writing direction and are part of internationalization bets practices, for example, XSL-FO. When translating a document between RTL and LTR systems, the approach used by OOXML will harder to deal with and be more expensive to translate, since the translator will need to manually play with styles on not just perform an semi-automated translation.

[End Update]

I’ll continue to review the remaining 173 pages of my random sample and update the numbers and the defect list as I go. If you want to play along at home, the upcoming random page numbers will be:

1039
4933
3334
1993
1632
4787
460
481
4497
310
282
2383
1793
2451
3310
3716
1261
1077
2219
4236
285
3090
737
2370
741
164
5044
364
2272
1377
4512
1410
964
5079
5030
4110
3620
3588
2301
3222
4485
5082
193
3632
985
1593
5155
1054
3371
3717
5015
1071
2965
2294
1809
161
4922
5219
1719
1040
4259
3134
1195
4232
4444
3931
2302
2788
3584
8
5092
2580
1080
1239
1415
1170
1501
151
148
4754
1350
3714
1895
3926
4833
2886
2983
1439
3622
4960
2000
2555
671
2388
352
222
1630
3033
4994
3346
531
2393
482
207
2252
4074
3302
2459
751
1891
1635
3120
2226
1119
810
1728
837
4570
4474
1072
3901
300
4895
1764
2332
619
4392
2112
1653
4339
2384
4566
4085
1171
2238
5144
1399
4157
1352
27
4118
4167
5046
4460
4053
1258
4252
922
3748
1742
458
4448
963
2227
1404
593
4140
1739
1102
1611
3016
2646
3083
5105
747
1142
2596
845
626
4047
1415
5143
3997

Comments

Anonymous says

2008/03/18 at 7:00 pm

Rob,

Those OLE references are of concern. Ecmas response to DK-0031 was to include faulty Bonobo & KParts examples which were approved at the BRM.

I’m told now though that the editor is at liberty to remove all OLE references because of the expressed notion (in some responses) that they’ll remove it.

1) Are they at liberty to?
2) Considering that they were not able to successfully remove it during the regular fasttrack process (before Jan 14th) how do we know that they’ll be able to for the final text?

Reply
dario says

2008/03/18 at 7:37 pm

“If you want to play along at home”

OK, this is what i found ( in a couple of hours of work, because i have better things to do that unpaid Microsoft/ECMA/ISO homework )

note 1 : error not fixed in BRM
note 2: this is the legacy clause numbering, the definitive clause number ( and the whole DIS 29500 beast ) is still unknown and was not published for review

————
Part 4: 2.2.1 background (Document Background) reads:

“themeTint (Border Theme Color Tint)
Specifies the tint value applied to the supplied theme color (if any) for this background.
If the themeTint is supplied, then it is applied to the RGB value of the theme color (from the theme part) to determine the final color applied to the document’s background.
The themeTint value is stored as a hex encoding of the tint value (from 0–255) applied to the current border.

[Example: Consider a tint of 60% applied to a border in a document. This tint is calculated as follows:

Sxml = 0.4 * 255
= 102
= 66(h)

The resulting themeTint value in the file format would be 66. end example]”

Error: the example says “a tint of 60%”, but the formula shows 0.4. All formulas in OOXML should be reviewed and should provide correct numbers.
————

Reply
Anonymous says

2008/03/18 at 9:21 pm

# Page 692, Section 2.7.3.13 — no errors found
# Page 1431, Section 2.15.3.30 — no errors noted
# Page 1193, Section 2.15.1.39 — no errors noted
# Page 1459, Section 2.15.3.46 — no errors noted
# Page 2671, Section 3.17.7.150 — no errors noted

Amazing! Just how many pages saying “THIS PAGE INTENTIONALLY LEFT BLANK” does the OOXML specification contain? ;-)

Reply
Anonymous says

2008/03/18 at 9:24 pm

About that tint, I would have to bet that it’s merely being unclear, and that they calculate it using the inverse percentage.

That is, the tint is 60%, so the ‘light’ allowed is 40%, and the formula should’ve shown (1.0 – 0.6) instead of a mysterious 0.4.

You’re right that it’s still a defect, though.

Reply
dario says

2008/03/18 at 9:34 pm

(…)

“If you want to play along at home”

OK, this is what i found ( in a couple of hours of work, because i have better things to do that unpaid Microsoft/ECMA/ISO homework )

note 1 : error not fixed in BRM
note 2: this is the legacy clause numbering, the definitive clause number ( and the whole DIS 29500 beast ) is still unknown and was not published for review

Part 4 section 5.9.3.4 5.9.3.4 prSet (Property Set) reads:

“This element holds properties and customizations which are used throughout certain elements in DiagramML.
…
Attributes:

coherent3DOff (Coherent 3D Behavior): Enables or disables the Coherent 3D behavior for styles that specify this property.”

Error/problem: No definition of the “Coherent 3D behaviour” is given ( a full search of the +6000 pages DIS + +2200 pages proposed dispositions + +500 pages of BRM pages was performed ).

DIS 29500 final text should be reviewed to found all the undefined terms and Microsoft/ECMA should properly define each of them.

(…)

Reply
Anonymous says

2008/03/18 at 10:03 pm

Found this:

—-
Part 4, Section 5: through out

There are +40 occurrences of the term “DiagramML” in Part 4. Examples:

“5.9.3.4 prSet (Property Set)

This element holds properties and customizations which are used throughout certain elements in DiagramML. The properties can be grouped into the following general categories.”

“5.9.7.50 ST_PtType (Point Type)

19 This simple type defines the different point types which can be utilized to create diagrams in DiagramML.”

“5.9.3.3 cxnLst (Connection List)

This element defines a group of connections. There can be a connection list defined for any data model which holds all of the connections between points defined in the diagram.

[Example: Consider the following example of a cxnLst in DiagramML:
…”

But Part 4 and Part 3-Primer don’t define what comprise “DiagramML” and don’t mention it as a member of the Office Open XML family of XML schemas.

If this is a typographical error, then it should be corrected. Otherwise DiagramML should be in a separate clause and be defined in Part 3-Primer and Part 4, clarifying how it relates to DrawingML.
—-

Reply
dario says

2008/03/18 at 10:16 pm

this is one of many found ( in a couple of hours of work, because i have better things to do that unpaid Microsoft/ECMA/ISO homework )

note 1 : error not fixed in BRM
note 2: this is the legacy clause numbering, the definitive clause number ( and the whole DIS 29500 beast ) is still unknown and was not published for review:

Part 4: 2.7.3.17 style (Style Definition) reads:

“…
General style properties refers to the set of properties which can be used regardless of the type of style; for example, the style name, additional aliases for the style, a style ID (used by the document content to refer to the style), if style is hidden, if style is locked, etc

[Example: Consider a style called Heading 1 in a document as follows:
w:style w:type=”paragraph” w:styleId=”Heading1″
w:name w:val=”Heading 1″/
w:basedOn w:val=”Normal”/
w:next w:val=”Normal”/
w:link w:val=”Heading1Char”/
w:priority w:val=”1″/
w:qformat/
w:rsid w:val=”00F303CE”/
…
/w:style

Above the formatting information specific to this style type are a set
of general style properties which define information shared by all style types. end example]
“

This example intends to “clarify” what is a general property.

But there is no indication of which of the w:style attributes or w:style’s child elements are general properties.

Besides this, this example is duplicated word by word with the example given in the “main” definition of a General Style Property ( 2.7.3 ):

“2.7.3 General Style Properties
General style properties refer to the set of properties which can be used regardless of the type of style.

…
[Example: Consider a style called Heading 1 in a document as follows:
w:style w:type=”paragraph” w:styleId=”Heading1″
w:name w:val=”heading 1″/
w:basedOn w:val=”Normal”/
w:next w:val=”Normal”/
w:link w:val=”Heading1Char”/
w:priority w:val=”1″/
w:qformat/
w:rsid w:val=”00F303CE”/
…
/w:style

Above the formatting information specific to this style type are a set
of general style properties which define information shared by all style types. end example]
“

So, a reference could be given in 2.7.3.17 to 2.7.3 to avoid unnecessary duplication of text.

Given the size of DIS 29500 and for the sake of clarity, easy of reading and understanding of the specification, all this kind of examples ( duplicated or that gives poor value to the reader ) should be reviewed, re-evaluated and eventually removed from DIS 29500.

A final text with this editions impacted should be submitted to NBs for review.

–Dario

Reply
dario says

2008/03/18 at 10:41 pm

This is one of many found ( in a couple of hours of work, because i have better things to do that unpaid Microsoft/ECMA/ISO homework )

note 1 : error not fixed in BRM
note 2: this is the legacy clause numbering, the definitive clause number ( and the whole DIS 29500 beast ) is still unknown and was not published for review

Part 4, Section 4.2.3 ext (Extension) reads:

“This element specifies an extension that is used for future extensions to the current version of DrawingML. “

The reference to DrawingML seems extraneous to this section ( Section 4 is about PresentationML ).

If this is a typographical error it should be corrected, otherwise the definition of the “ext” element should be clarified.

Errors like this seems to show that this DIS 29500 has been subject to a furious and rushed abuse of “copy and paste”, improper for a text that expects to be awarded with the ISO brand. All the text should be carefully reviewed and this kind of errors corrected.

ISO should warn standards organization who submits text with so much editorial and technical errors because this shows a lack of respect to ISO national bodies members that must review such gobbledegooked text.

Reply
Anonymous says

2008/03/18 at 10:53 pm

Found this

Part 4, Section 5.1.2.1.14 ext (Extension) reads:

“This element specifies an extension that is used for future extensions to the current version of DrawingML. This allows for the specifying of currently unknown elements in the future that will be used for later versions of generating applications.
..
Attributes Description

uri (Uniform Resource Identifier): Specifies the URI, or uniform resource identifier that represents the data stored under this tag. The URI is used to identify the correct ‘server’ that can process the contents of this tag.

The possible values for this attribute are defined by the XML Schema token datatype.

The following XML Schema fragment defines the contents of this element:

complexType name=”CT_OfficeArtExtension”
sequence>
any processContents=”lax”/
/sequence
attribute name=”uri” type=”xsd:token”/
/complexType”

It is not clear what it means “to identify the correct ‘server’ that can process the contents of this tag”.

The example reference “Office Art” which is a proprietary ( and possibly patented ) feature of a Microsoft Office product ( http://www.microsoft.com/technet/archive/office/office97/support/sr1off97.mspx?mfr=true ).

If this extension mechanism is headed as an interoperability mechanism, then it should be clarified how to achieve it and what type of servers are expected to process the tag’s content.

Reply
orlando says

2008/03/18 at 10:58 pm

From http://elot.ece.ntua.gr/te48/ooxml/brm-clarifications:

“The third issue is that, while writing my proposal, I and my reviewers found 13 additional errors in the original specification. However, national bodies were not allowed to submit new comments (and rightly so, otherwise there would have been total chaos). Therefore, there was no way to submit and correct them.”

Reply
dario says

2008/03/18 at 11:07 pm

This is one of many found ( in a couple of hours of work, because i have better things to do that unpaid Microsoft/ECMA/ISO homework )

note 1 : error not fixed in BRM ( nor in the +2200 pages of ECMA fixes )

note 2: this is the legacy clause numbering, the definitive clause number ( and the whole DIS 29500 beast ) is still unknown and was not published for review

Part 3, Section 5.3.2.22 Table Style:

There are three references to what it seems to be the same element but with distinct names in this normative section:

“band2Vertical” in line 5 of page 277
“band2Vertial” in line 6 of page 277
“band2V” in line 34 of page 277 ( in a XML fragment )

The same goes to the “band1Vertical”, “band2Horizontal” and “band1Horizontal” elements

If this are distinct elements, then they should be defined. Otherwise, the element names and the corresponding schema should be corrected.

Reply
Anonymous says

2008/03/18 at 11:10 pm

“Part 3, Section 5.15.6.1.8 Parameter ID reads:

“bkPtFixedVal – specifies where the sname should break if bkpt is set to fixed”

No definition of the term “sname” was found in part 3 ( a full search of the +6000 pages DIS + +2200 pages proposed dispositions + +500 pages of BRM pages was performed ).

DIS 29500 final text should be reviewed to found all the undefined terms and Microsoft/ECMA should properly define each of them.

Reply
Anonymous says

2008/03/18 at 11:12 pm

Fount this ( not fixed nor corrected at BRM )

Part 3, Section 5.8.4 Fills reads:

“The se types describe the general structure of all fills;”

Can’t understand what this text means.

All the text should be reviewed ( don’t rush please ! ) and properly corrected.

Reply
dario says

2008/03/18 at 11:19 pm

This is one of many found ( in a couple of hours of work, because i have better things to do that unpaid Microsoft/ECMA/ISO homework )

note 1 : error not fixed in BRM nor ECMA +2200 pages fixes document.

note 2: this is the legacy clause numbering, the definitive clause number ( and the whole DIS 29500 beast ) is still unknown and was not published for review.

Part 4, Section 2.11.17 numFmt (Footnote Numbering Format) reads:

“This element specifies the numbering format which shall be used to determine the footnote or endnote reference mark value for all automatically numbered footnote and endnote reference marks (those without the suppressRef attribute set).”

The definition of the suppressRef attribute was not found in Part 4 nor in any schema ( a full search of the +6000 pages DIS + schemas + +2200 pages proposed dispositions + +500 pages of BRM pages was performed )

Same problem at Part 4, 2.11.18 numFmt ( Endnote Numbering Format ).

DIS 29500 final text ( still not provided by ECMA ) should be reviewed to found all the undefined terms.

Reply
Anonymous says

2008/03/18 at 11:26 pm

Found this ( not fixed in BRM nor ECMA +1000 dispositions )

Part 4, Section 2.3.1.41 textDirection (Paragraph Text Flow Direction) reads

“…This element specifies the direction of the text flow for this paragraph.

If this element is omitted on a given paragraph, its value is determined by the setting previously set at any level of the style hierarchy (i.e. that previous setting remains unchanged). If this setting is never specified in the style hierarchy, then the paragraph shall inherit the text flow settings from the parent section.

[Example: Consider a document with a paragraph in which text should flow bottom to top vertically, and left to right horizontally. This setting would be specified with the following WordprocessingML:

w:pPr
w:textFlow w:val=”btLr” /
/w:pPr

The textFlow element specifies via the btLr value in the val attribute that the text flow should go bottom to top, and left to right. end example]”

The text and examples mention a textFlow element, but the clause is about a textDirection element.

The text and/or examples and/or schema should be corrected.

ECMA should be banned during one year of submitting fast-track DIS, and should be warned to not submit such poor, copied and pasted specifications derived from an internal Microsoft product documentation.

Reply
Anonymous says

2008/03/18 at 11:31 pm

Part 4, Section 5.1.12.60 ST_TextAnchoringType (Text Anchoring Types) reads:

“This is different than ‘anchorJustified'”

No definition of the term ‘anchorJustified’ was found in Part 4 ( a full text search was performed on the +6000 original DIS + 2250 pages of ECMA fixes + +500 pages of BRM fixes )

The term should be defined or corrected.

Reply
Steven G. Johnson says

2008/03/18 at 11:57 pm

Joining in the “find a flaw” game with just one of your listed pages (from ECMA 376 part 4). (The following has not been checked against BRM resolutions.)

p. 3997, sec. 5.7.2.13-14:

5.7.2.13 bandFmt (Band Format)
This element specifies the formatting band of a surface chart.

5.7.2.14 bandFmts (Band Formats) This element contains a collection of formatting bands for a surface chart indexed from low to high.

A “bandFmt” consists of a index (idx) and a shape property (spPr, which defines things like fill pattern and outline). However, nowhere is it clearly defined how an implementation is to display a “surface chart” from a collection of “formatting bands”.

In particular, the “bandFmts” are used in both 5.7.2.204 (surface3Dchart) and 5.7.2.205 (surfacechart). In the latter (2d) case, I suppose one might guess that the “band formats” are a sequence of shapes to be drawn, one on top of the other, to form a 2d contour chart, but this is not specified explicitly. In the former (3d) case, I have a much harder time trying to infer how the “formatting bands” are to be drawn. From what perspective is the 3d chart drawn, for example…is a scene3d child element (5.1.4.1.26) required to be present in the spPr in this case, and if not what is the default? Apparently not specified. And what in the world should one do if different “formatting bands” of the same chart have different scene3d children!? Even if a scene3d child is present, what 3d perspective (e.g. orthogonal projection?) is used? This is apparently set by 5.1.12.47 (preset camera type), but a cursory inspection of that section reveals perspectives that are grossly underspecified, and seem to be each “defined” largely by a single example image. (For example, “legacy oblique top” and “perspective below” are hardly sufficient to define the precise viewing angles, vanishing points, etcetera. Furthermore, wouldn’t it be better to just define those quantities numerically rather than have a finite number of underspecified presets—compare, e.g., how OpenGL perspective cameras are defined by a position and 6 numbers—much less including “legacy” presets?) Or, for example, what does the “wireframe” boolean attribute (which “specifies the surface chart is drawn as wireframe”, 5.7.2.231), really mean in terms of the visual appearance of the chart? Not explained (and there are multiple reasonable ways to draw 3d wireframes, e.g. as a set of disjoint 3d contour lines, or tesselated into rectangles, or triangles, or…).

One could go on and on….it doesn’t seem possible for two implementations to display surface charts from the same file, especially 3d surface charts, in the same way based only on this specification, without referring to one another’s implementation.

In searching for information on “band formats”, I found another apparent goof:

5.7.2.144 pivotFmts (Pivot Formats) This element contains a collection of formatting bands for a surface chart indexed from low to high.

Except that “pivotFmts” aren’t used in surface charts, they are a child of “chart” (5.7.2.27), and contain a list of “pivotFmt” elements which are a “set of formatting to be applied to the chart that is based on a pivotTable” (5.7.2.143). So 5.7.2.144 seems to be misdescribed (copy-and-pasted from 5.7.2.14?).

It’s a horrifying exercise to go through the ECMA 376 document while pretending it will be your job to decipher and actually implement these features. Even starting at a random page (3997, from Rob’s list), one finds a neverending chain of vagueness and slapdash engineering.

In practice, it seems practically impossible for an implementer to proceed without continually checking how MS Office interprets these Mycenaean scratchings.

Reply
Anonymous says

2008/03/19 at 3:27 am

Funny! This could become a new family game, “The DIS29500 errors quest”.

“Look daddy, I got 5 in my page!”

Reply
Anonymous says

2008/03/19 at 4:13 am

A simple calculations gives us the expected number of 448 errors in the remaining 175 pages and 15,475 technical errors in all 6045 pages.

The 5 day BRM discussed and voted on 63 dispositions (Andy Updegrove’s blog). We can calculate that it will take around

15,475 / (63 / 5) = 1228 days

to vote on all technical defects. As people will be able to work for 200 odd
days per year, it will take some 6 years to resolve all the technical errors.
(/sarcsm)

Winter

Reply
dario says

2008/03/19 at 8:39 am

This is one of many found ( in a couple of hours of work, because i have better things to do that unpaid Microsoft/ECMA/ISO homework )

note 1 : error not fixed in BRM nor ECMA +2200 pages fixes document.

note 2: this is the legacy clause numbering, the definitive clause number ( and the whole DIS 29500 beast ) is still unknown and was not published for review

Part 4: 2.7 Styles reads:

“Each style defined within a WordprocessingML document requires a style definition. The style definition contains all of the information needed by a consumer to store and display that style within a WordprocessingML document, and is defined using the style element.”

This text is confusing: the consumer doesn’t need to “store” the style in a WordprocessingML document, because he is “consuming” the style from the WordprocessingML document.

The text should be corrected, either mentioning the “producer” or, if the paragraph’s text is only applicable to consumers, deleting the words “store within a WordprocessingML document”.

Reply
Anonymous says

2008/03/19 at 8:45 am

Found this ( not fixed in BRM nor ECMA reflushing of DIS 29500 )

Part 4: 2.7 Styles reads:

“Within a WordprocessingML file, styles are predefined sets of table, numbering, paragraph, and/or character properties which can be applied to text within the document.”

This definition is incomplete: according WordprocessingML, styles could be applied not only to text, but to other WordprocessingML objects, i.e: a table style could be applied to a table with only graphics and no text in each cell. In 2.7.3.17 a different and more appropiate definition is given:

“2.7.3.17 style (Style Definition)
A style is a predefined set of table, numbering, paragraph, and/or character properties which can be applied to regions within a document.”

( first definition: “can be applied to text”, second definition: “can be applied to regions within a document )

A complete, unified and coherent “style” concept should be given all throughout the text of DIS 29500, Parts 1, Part 2, Part 3, Part 4, or whatever parts remain after the multipart proposal is developed and applied to this document ( don’t rush please ! ).

Reply
Anonymous says

2008/03/19 at 9:00 am

This was found in the original DIS 29500 Part 3 document http://www.ecma-international.org/cgi-bin/counters/unicounter.pl?name=ECMA-376_part3pdf&deliver=http://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%20Part%203%20(PDF).zip
and the error is not mentioned in the 3500 NBs comments

Now ECMA is reorganizing parts, adding conformance clauses and changing conformance terminology, so i don’t know if it will be fixed in the DIS 29500 final text:

———-
Part 3: throughout

Overlapping “start” and “end” marks of informative text were found in Part 3.

Examples:

Page 229 ( 4. Introduction to PresentationML ) contains a “This clause is informative” begin mark, but the inmediate prior mark ( at page 92 ) was a “This clause is informative” begin mark too.

Page 260 ( 5. Introduction to DrawingML ) contains a “End of informative text.” end mark, but the inmediate prior mark ( at page 258 ) was a “End of informative text.” end mark too.

So, there are big portions of Part 3 where there is no way to tell if they are “normative” or “informative” text.

All the text of Part 3 should be reviewed and the overlapping informative marks be corrected.

It is suggested that the normative and informative marks system used by ECMA and “suffered” by reviewers be augmented by suffix letters added to the right line numbering.

So, there is no need to navigate through hundreds or thousand of pages within the whole text to know if one is reading normative or informative text.

———-

Reply
Anonymous says

2008/03/19 at 9:33 am

Another one:

Part 4, Section 5.3 DrawingML – Legacy Compatibility

The normative text of Section 5.3 reads:

“Within the context of DrawingML, it must be possible (for considerations to legacy compatibility) to be able to include explicit references to specific shapes within VML Drawing parts.

5.3.2 Basics

Legacy Compatibility is part of the shape definitions and properties of the DrawingML framework.

5.3.2.1 1 legacyDrawing (Legacy Drawing Object)

This element specifies the shape ID for a legacy drawing object. These legacy drawing objects all have a shape ID associated with them that is unique across the entire document. In order to store these legacy shape IDs as well as new shape IDs this legacyDrawing element should be used.

Attributes: spid (Shape ID): Legacy Shape ID that is unique throughout the entire document. Legacy shape IDs should be assigned based on which portion of the document the drawing resides on. The assignment of these ids is broken down into clusters of 1024 values. The first cluster is 1-1024, the second 1025-2048 and so on.”

There are two problems with this text:

i) the first paragraph says “This element specifies the shape ID for a legacy drawing object.” but the same paragraph says later “In order to store these legacy shape IDs as well as new shape IDs this legacyDrawing element should be used.”. So it is no clear if this element specifies shape IDs of legacy drawing objects only, or for legacy objects and new ( no legacy ) objects.

ii) The 2nd paragraph says “Legacy shape IDs should be assigned based on which portion of the document the drawing resides on” but gives no indication of how to perform this assignment. There are three examples that mention one criteria of assignment but this examples are informative.

The text of 5.3.2 should be reviewed to clarify the definition of the “Legacy Drawing Object”, and to provide precise normative of how to assign shape IDs.

Reply
Anonymous says

2008/03/19 at 9:44 am

Another ( not fixed in BRM nor 2000 pages of ECMA fixes)

“This indicates a moveto the given coordinate.”

If ‘moveto’ is a VML command name then the text should be “This indicate a ‘moveto’ command to the given coordinate”.

Otherwise replace “moveto” with “move to” or define the term.

Reply
dario says

2008/03/19 at 9:58 am

This is one of many found ( in a couple of hours of work, because i have better things to do that unpaid Microsoft/ECMA/ISO homework )

note 1 : errors not fixed in BRM nor ECMA +2200 pages fixes document.

note 2: this is the legacy clause numbering, the definitive clause number ( and the whole DIS 29500 beast ) is still unknown and was not published for review

Part 4 2.3.2.5 color (Run Content Color) reads:

“themeShade (Run Content Theme Color Shade)
…Te resulting themeShade value in the file format would be 66. end example]”

Don’t understand the text “Te Resulting”.

—————-
Part 4: 2.3.2.30 shd (Run Shading) reads:

“themeShade (Shading Pattern Theme Color Shade)
…Te resulting themeShade value in the file format would be 66. end example]”

Don’t understand the text.
—————-
Part 4: 2.4.33 shd (Table Cell Shading) reads:

“themeShade (Shading Pattern Theme Color Shade)
…Te resulting themeShade value in the file format would be 66. end example]”

Don’t understand the text.
—————-
Part 4: 2.4.34 shd 1 (Table Shading Exception) reads:

“themeShade (Shading Pattern Theme Color Shade)
…Te resulting themeShade value in the file format would be 66. end example]

Don’t understand the text.
—————-
Part 4: 2.4.35 shd (Table Shading) reads:

“themeShade (Shading Pattern Theme Color Shade)
…Te resulting themeShade value in the file format would be 66. end example]”

Don’t understand the text.
—————-
Part 4: 2.15.2.5 color (Frameset Splitter Color) reads:

“themeShade (Run Content Theme Color Shade)
…Te resulting themeShade value in the file format would be 66. end example]”

Don’t understand the text.
—————-
Part 4: 2.3.1.31 shd (Paragraph Shading) reads:

“themeShade (Shading Pattern Theme Color Shade
…Te resulting themeShade value in the file format would be 66. end
example]”

Don’t understand the text.

Errors like this show that this DIS 29500 has been subject to a furious and rushed abuse of “copy and paste”, improper for a text that expects to be awarded with the ISO brand. All the text should be carefully reviewed and this kind of errors corrected.

ISO should warn standards organization who submits text with so much editorial and technical errors, because this shows a lack of respect to ISO national bodies members that must review such gobbledegooked text.

Reply
Anonymous says

2008/03/19 at 10:06 am

Found this ( not fixed in BRM nor 2000 pages of ECMA fixes) ( note: original “deprecated/transitional?” clause numbering )

Part 4, Section 3.8.31 numFmts (Number Formats) reads:

“The value of this attribute is a Globally Unique Identifier in the form of {HHHHHHHHHHHH-
HHHH-HHHH-HHHHHHHH} where each H is a hexidecimal.”

If this is a typo it should be corrected, otherwise the term “a hexidecimal” should be defined.

Reply
Anonymous says

2008/03/19 at 10:13 am

Found this ( not fixed in BRM nor in the reflushing of deprecated/transitional material in the multipart reorganizing draft proposal mentioned here: http://www.itscj.ipsj.or.jp/sc34/open/0989_reference_docs.zip)
)

(note: legacy clause numbering )

Part 4, Section 3.8.31 numFmts (Number Formats) reads:

“Language info is a 32-bit value entered in hexidecimal format.”

The term “hexidecimal” was found many times in the text of DIS 29500. If it is a typo, then the text should be corrected, otherwise the term hexidecimal should be defined.

Reply
dario says

2008/03/19 at 10:33 am

This was found in the original DIS 29500 Part 3 document http://www.ecma-international.org/cgi-bin/counters/unicounter.pl?name=ECMA-376_part3pdf&deliver=http://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%20Part%203%20(PDF).zip:

Part 4, 3.8.31 numFmts (Number Formats) reads:

“When laoding in CHT (Tawian only) locale: Era year since 1912. If preceeded by “g”, “gg”, or “ggg” then year of 1912, and year before 1912 are special, otherwise years less than 1912 are gregorian.

Very unclear text, “suggestions” :

Replace “laoding” with “loading”
Replace “Tawian” with “Taiwan”
Replace “year of 1912” with “years of 1912”

This was partially fixed by ECMA in the +2200 page document, but they re-introduced a typo:

“lading” instead of “loading”

[Begin of personal opinion

man… change your keyboard ! or don’t rush when you draft international standards, or at least, if you copy and paste internal product documentation , read what you are coping

End of personal opinion]

Reply
dario says

2008/03/19 at 10:47 am

This is one of many found ( in a couple of hours of work, because i have better things to do that unpaid Microsoft/ECMA/ISO homework )

note 1 : error not fixed in BRM ( nor in the +2200 pages of ECMA fixes )

note 2: this is the legacy clause numbering, the definitive clause number ( and the whole DIS 29500 beast ) is still unknown and was not published for review

Part 4: throughout

Section 2.3.2.8 eastAsianLayout (East Asian Typography Settings) reads:

“…id (East Asian Typography Run ID) Specifies a unique ID which shall b used to link multiple runs containing eastAsianLayout element to each other to ensure that their contents are correctly displayed in the document.

This means that multiple runs which are broken apart due to differences in formatting can be identified as belonging to the same grouping in terms of eastAsianLayout properties, although they are separated into multiple runs of text.

[Example: Consider the following three runs in a document:
w:r
w:rPr
w:asianLayout w:id=”-1552701694″ w:combine=”lines”
w:combineBrackets=”curly” /
/w:rPr
w:t two /w:t
/w:r
w:r
w:rPr
w:u w:val=”single” w:color=”4F81BD” w:themeColor=”accent1″
/
w:asianLayout w:id=”-1552701694″ w:combine=”lines”
w:combineBrackets=”curly” /
/w:rPr
w:t lines in /w:t
/w:r
w:r
w:rPr
w:asianLayout w:id=”-1552701694″ w:combine=”lines”
w:combineBrackets=”curly” /
/w:rPr
w:t one /w:t
/w:r

Although there are three runs of content, all three regions shall be combined into a single two lines in one region based on the identical value used in the id attribute for all three runs. end example]”

[ dario:

This is a definition of an attribue of the eastAsianLayout element, but the example references another element: w:asianLayout extraneous to this subclause and the whole Part 4.

This kind of errors are found all throughout Part 4. Examples:

end dario ]

“2.4.73 top (Table Cell Top Margin Exception):

This top cell border is specified using the following WordprocessingML:

w:tc
w:tcPr
…
w:tcBorders
w:top w:val=”thinThickThinSmallGap” w:sz=”24″ w:space=”0″
w:color=”FF0000″/
/w:tcBorders
/w:tcPr
w:p/
/w:tc

The top element specifies a three point border of type thinThinThickSmallGap. end example]”

[ dario:

The example says “thinThickThinSmallGap”, but the text says “thinThinThickSmallGap”.

end dario ]

“2.3.3.18 noBreakHyphen (Non Breaking Hyphen Character): This element specifies that a non breaking hyphen character shall be placed at the current location in the run content. A non breaking hyphen is the equivalent of Unicode character 002D (the hyphen-minus), however it shall not be used as a valid line breaking character for the current line of text when displaying this WordprocessingML content.
…
If this was not desired, the non breaking hyphen character could be specified as follows:

w:r
w:t This makes a very very very wordy and deliberately overcomplicated
s /w:t
w:nonBreakHyphen/
w:t entence. /w:t
/w:r

This would display a hyphen character, but would not allow the text to break at that location:

This makes a very very very wordy and deliberately overcomplicated s-entence. end example]”

[ dario:

The definition is about the noBreakHyphen element, but the example contain an w:nonBreakHyphen/ , it is another element, it is a typo?.

end dario ]

“2.18.26 ST_EdnPos (Endnote Positioning Location): This simple type specifies the possible positions of endnotes in a document.

[Example: Consider a document in which endnotes shall be positioned at the end of the section. The section properties for this section shall be declared as follows:

w:settings
w:endnotePr
w:pos w:val=”endSect” /
/w:endnotePr
…
/w:settings

The val attribute is endSect, therefore the position of endnotes is specified to be at the end the section. end example]

Enumeration Value:

sectEnd (Endnotes Positioned at End of Section)”

[ dario:

The text and XML fragment read “endSect” but the enumeration names it “SectEnd”.

Problem detected:

Examples of OOXML markup with invalid XML or with typographical errors in elements and attributes names give poor value to the reader and could result in confusion rather than help to understand this specification. Microsoft first, ECMA second, and then NBs should review all the examples of Part 4 ( aprox. 5500 ) to catch and correct this kind of errors.

Some XML parsing and validating tools are available on the internet that could help in this task ( example: saxon, libxml, Microsoft MSXML, etc. ), some of them at no cost to the user.

ECMA should be banned during one year of submitting fast-track DIS, and should be warned to not submit such poor, copied and pasted specifications derived from an internal Microsoft product documentation.

end dario ]

Reply
kozmcrae says

2008/03/19 at 11:30 am

Rob, good work and thank you. Microsoft is like a beautiful gleaming mansion. Unfortunately, a group of homeless squatters have taken up residence and are trashing the place as bicker among themselves. BTW there are two page 1415s.

Reply
Rob says

2008/03/19 at 11:51 am

@Dario, one way to understand the huge number of spelling errors is that OOXML is too large to spell check. If you load the Word version of Part 4 into Word, it will give you an warming message, telling you that too many spelling errors have been detected and that it must disable spell checking.

And thanks for all the additional examples! I think this gives an important perspective on Microosft’s BRM claims. Does it really matter if the BRM “resolved” 98.44% of the NB ballot comments, if those comments covered less than 2% of the defects in the text?

Reply
ivanstalyn says

2008/03/19 at 4:13 pm

@Dario and anonymous,

I have been reading these commnents for almost 1 hour.

Oh my God,I am such a geek. I found this stuff very funny. I have been laughing at every comment.

I cannont believe this is serious stuff!

guys, have mercy.. I have asthma…too much laugh can kill me!

@Rob,

Keep it up the very good work.

Reply
dario says

2008/03/19 at 5:14 pm

note: the subclause numbers are the old ones; the will be changed by the part reshuffle

Part 4:

The following elements:
. style ( Style Definition ) [2.7.3.17]
. styles ( Styles Definitions ) [ 2.7.3.18 ]

are included as sub-clausses of General Style Properties [ 2.7.3]

But, according with clause 2.7 ( Styles ), the “General Style Properties” [2.7.3] are one of the three segments of a “Style Definition” [2.7.3.17].

So, to prevent confussion, 2.7.3.17 and 2.7.3.18 should be outer and not inner subclausses of 2.7.3.

All the subclausses of 2.7.3 must be “General Style Properties” ( the “style” and “styles” elements shouldn’t be there, they should be moved to an appropiate subclausse level ).

Reply
Anonymous says

2008/03/19 at 5:20 pm

Found this:

——-
Part 4: throughout

Case mismatch in elements and attribute names are given all throughout Part 4. Examples:

Part 4, Section 2.3.1.33: “…If the beforeAutoSpacing attribute is also specified, then this attribute value is ignored”

In the schema annexed is referenced as “beforeAutospacing”

Part 4, Section 2.3.2.24: “If the csTheme attribute is also specified, then this attribute shall be ignored and that value shall be used instead.”.

In the schema annexed is referenced as “cstheme”

Part 4, 2.5.2.6 dataBinding (XML Mapping): “The custom XML data identifier, specified using the storeItemID attribute of the dataStoreItem element”

In the schema annexed is referenced as “datastoreItem”

All the occurrences of elements and attribute names in normative and informative text of Part 4 should be reviewed, and it must be assured that they match the corresponding schema submitted as DIS 29500’s annexes.

There exist open source validation tools ( saxon, libxml ) that can be used by Microsoft ( free of charge ).

Reply
Anonymous says

2008/03/19 at 5:34 pm

note 1: clauses numbering and page numbers corresponding to the original ECMA beast ( Part 4 ):

http://www.ecma-international.org/cgi-bin/counters/unicounter.pl?name=ECMA-376_part3pdf&deliver=http://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%20Part%203%20(PDF).zip

note 2: errors not fixed in the +2200 pages of ECMA fixes

—-
Part 4: throughout

Errors in elements and attribute names are found all throughout normative text of Part 4.

Examples:

Part 4, Section 2.9.10 lvlPicBulletId (Picture Numbering Symbol Definition Reference) reads: “This element specifies a picture which shall be used as a numbering symbol for a given numbering level by referring to a picture numbering symbol definition’s numPictBullet element”

But the numPictBullet element is named “numPicBullet” in the clause 2.9.21

Part 4, Section 2.15.1.18 characterSpacingControl (Character-Level Whitespace Compression) reads: “The characterSpacingControl element has a val attribute value of ‘dontCompress’, which specifies that no character compression shall be applied “

But the dontCompress element is named “doNotCompress” in the schema annexed.

The section 2.15.1.10 names in six contiguous lines of the same page ( page 1118 ) the “Automatically Hyphenate Document Contents When Displayed” element with three different names:

“autoHypehenation” in line 18
“autoHypehnation” in line 20
“autoHyphenation” in line 14

The entire Part 4 should be reviewed to find and correct all the occurrences of elements and attribute names and it must be assured that they are given an unique name through out Part 4. The name must match the corresponding schema submitted as DIS 29500’s annexes.

ECMA should be banned during one year of submitting fast-track DIS, and should be warned to not submit such poor, copied and pasted specifications derived from an internal Microsoft product documentation.

Reply
Rob says

2008/03/19 at 5:58 pm

Thanks, anonymous, for those naming errors. Readers who are not XML practitioners should note that XML names are case-sensitive, so “datastoreItem” and “dataStoreItem” are in fact two different and incompatible names.

Reply
Anonymous says

2008/03/19 at 8:14 pm

9. Page 4492, Section 6.1.2.11 — (…) Why would “bilevel” necessarily lead to 8 colors? We’re well beyond 8-bit color these days. (…)

Three-bit color. One bit per color channel.

Reply
Steven G. Johnson says

2008/03/19 at 10:47 pm

Another page from your list: p. 3901, section 5.5.2.3 (Anchor for Floating DrawingML Object).

One of the attributes is “locked”:

Specifies that the anchor location for this object shall not be modified at runtime when an application edits the contents of this document. [Guidance: An application might have automatic behaviors which reposition the anchor for a DrawingML object based on user interaction – for example, moving it from one page to another as needed. This element shall tell applications not to perform any such behaviors. end guidance]

As I understand it, this means that, once you set the “locked” attribute for an inline graphic, the application shall not provide anyway to unset it or even to delete the graphic, which doesn’t make much sense. This clause should be written more narrowly to make it clear that the application can changed the locked setting as a result of an explicit user indication, at least.

In the same anchor element, there is also a “relativeHeight” attribute:

Specifies the relative Z-ordering of all DrawingML objects in this document. Each floating DrawingML object shall have a Z-ordering value, which determines which object is displayed when any two objects intersect. Higher values shall indicate higher Z-order; lower values shall indicate lower Z-order.

Problem: It doesn’t specify what should be done if two objects have the same Z-ordering value, nor does it state that the Z-ordering values must be distinct.

I searched the proposed BRM resolutions, and neither of these seems to be addressed.

Reply
Steven G. Johnson says

2008/03/19 at 11:41 pm

Another page from your list: p. 5143, section 7.5.3.1 ST_Guid (128-bit GUID Value):

This simple type specifies that its values shall be a 128-bit globally unique identifier (GUID) value.

It further states that:

This simple type’s contents must match the following regular expression pattern: \{[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}\}.

and they give the example {A67AC88A-A164-4ADE-8889-8826CE44DE6E}”

First problem: they don’t give any guidance regarding how to interpret this ST_Guid string as a “128-bit” integer value. Presumably it is 32 hexadecimal digits, but the implementor should not need to guess, nor is the byte order indicated. [And elsewhere in the standard, GUID values are manipulated using integer arithmetic, e.g. in section 2.8.1 (Font Embedding) it requires you to “reverse the order of the bytes” of a GUID and “XOR the value” with something.]

Second problem: the GUID “shall” be a “globally unique identifier,” but what does it mean to be “globally unique?” Does the implementation have to require that the identifier is unique within the document, within all OOXML documents on the user’s hard disk, or all OOXML documents in the globe? No explanation is given anywhere, nor is any algorithm to generate GUIDs described.

According to Wikipedia, “GUID” is a Microsoft terminology for a specific space of 128-bit identifiers (and specific bit patterns are reserved for use in various Microsoft protocols), and is generated by specific algorithms (some of which apparently create serious privacy problems). Presumably, some version of Microsoft’s GUID “standard” is what is intended for ST_Guid, but as far as I can tell this is never explicitly indicated.

In short, OOXML needs to define what precisely is meant by a “GUID” and how they are to be generated.

Furthermore, ST_Guid is not used consistently throughout Ecma 376. In section 7.6.2.30 (Guid), it defines b:Guid element that “specifies the GUID of a source” and uses the same format as ST_Guid in the example, but is stored as the “ST_String255 simple type”. OOXML also defines a ST_Clsid (Class ID simple type, sec. 7.4.3.2), which stores a GUID with exactly the same format definition as ST_Guid—why not use ST_Guid for class IDs, then?

I searched the proposed BRM resolutions and couldn’t find anything on this issue. (I think. Rob, can you provide a link to a PDF of all the BRM resolutions? I’m not sure I’m searching the right thing for the BRM.)

Reply
Steven G. Johnson says

2008/03/20 at 12:05 am

Just for reference, I think I’ve been using different page numbers from you, Rob—I’ve been using the document page number as listed at the bottom of each page, but you are apparently using the PDF page number (= document page number + 7).

Not that this changes the sampling statistics, but it messes up coordination a bit, sorry.

Reply
TomS says

2008/03/20 at 3:53 am

Rob,

I can’t believe you are obsessing about these issues. ECMA’s got it covered.

1. They’ll either be fixed in maintenance;

2. The features will be deprecated to an appendix (as a Word97 .doc);

3. The binary API documentation will be posted on the web, someplace MS Live Search can find them — eventually.

I believe every complaint you raise can be answered by some combination of 1, 2 or 3.

– – – – – – – – – – – – –
It’s essential you understand the economics of this issue. Failure to approve MS OOXML as proposed will waste all the resources devoted to it’s acceptance so far. It will literally cause the entire work to be rebuilt from scratch.

Given the economics, MS’s ECMA division will probably have to breakup this grand opus into many smaller works, addressing specific facets and subsume ODF specs as a partial fix.

How odious would that be!!

Geez Rob, ease up.

Reply
bitmonger says

2008/03/20 at 10:48 am

You seem to be better at statistics than me, but …

Shouldn’t the error be more like
2 standard deviations not 1 standard deviation. (2 sd would be approximately 95% confidence range)

that would be 1.5 % +/- 6 %

Am I crazy here?

Garick

Reply
Rob says

2008/03/20 at 11:16 am

@Garick, Your memory is correct. The 95% confidence level would be 1.96*SE on either side. But I think I have that. But since the standard error was 1.6%, the reported +/- 3% takes that into account.

Reply
Gerardo Tasistro says

2008/03/21 at 2:02 am

@ TomS

Regarding your point #1, Maintenance. My advice is if they can’t fix it today what makes you think they’ll have the will and time to fix it tomorrow.

Regarding your point #3, posting on the web. The fast track would have been faster if they had submitted a 10 page proposal and then just post ed the rest someday on MS Live Search.

Regarding your comment on economics. Please take into consideration all those people and companies that will use the proposal if it is approved as a standard. Does their time cost nothing? If the proposal is broken so will the implementations based on it. This will cost others money to implement it and then MS some more to fix it and then again some more money to fix the implementations based on the no longer valid “standard”. As a user I prefer it be only MS ECMA that loses. After all it is MS’s fault it is so poorly written. Others have had proposal approved. Clearly it wasn’t cheap, but they got it right.

Reply
Anonymous says

2008/03/21 at 2:35 pm

Rob and Dario –
I am a technical writer, and what you are seeing is the result of a “word processor” … like a food processor, but for text. This is a chop-and-drop document that most companies would have been embarrased to release as a preliminary draft.

This could have been spell-checked, even in Word, by breaking it into chapters. But they didn’t bother.

The mismatch between functions and definitions could have been checked with very little effort. I have cross-checked technical documents of several hundred pages myself, in under a week. Was ECMA too stingy to hire a few experienced technical writers and editors? Or was Microsoft spending all the petty cash on stuffing the committees with “members” that appeared, voted and were never seen again?

Tsu Dho Nimh

Reply
Anonymous says

2008/03/21 at 3:24 pm

Rob, you’re doing a really great thing here, THANKS! (Your blog is listed on LinuxToday, and I said so on that website as well.)

Reply
Anonymous says

2008/03/21 at 3:50 pm

Can anyone tell me if this Excel Color Compatibility Problem is in any way related to OOXML?

http://dearmicrosoftofficeteam.blogspot.com/2008/03/dear-microsoft-office-2007-team-please_03.html

Reply
TomS says

2008/03/23 at 1:21 am

@ Gerardo Tasistro:
Sorry, apparently my [sarcasm] tag got swallowed by the blogger-ware.

All of my points were just restatements of classic MS replies to valid issues. I just got them out of the way before someone serious rolled them out.

The one valid belief that I have about the process is that the only way to implement any MS specific formats HAS to be a superset of ODF and other ISO or internationally recognized standards. The only parts they (the MS superset) should include will be application specific tags, mapping, decoding and schema, unless it is clearly new and previously undocumented functionality.

The stupidity of not extending ODF for common functions, or contributing to a BETTER vector graphic ML, or using a defined date standard is appalling.

A valid MS friendly format really should have been under 200 pages, simply by incorporating accepted, recognized methods and standards.

Sorry for dragging that red herring across the path.

Reply
Chris Ward says

2008/03/23 at 5:00 pm

Well yes, but what will you do about it ?

This thing is coming to a vote; it’s going to be like a presidential election. Sure to be ‘Republican’ or ‘Democrat’ who wins, but no-one is sure which, right now.

“America has spoken. We are just not sure what she has said”. Where have I heard that before ?

And the policies … the ways forward for the technology industries over the next few years … are different.

Speak now. Get your vote counted.

Reply
zbog says

2008/03/24 at 4:13 am

Hello Rob,

Pardon me for asking, but why do we have to bring up the remaining defects in OOXML? The standardization process should be based on the pledge of the initiator of that standard that it is good.
That is, by default any proposed standard should be considered bad/non standard worthy, and the initiator should be required to come up with evidence that proves its worthiness. And should that evidence be unconvincing, the standard should be turned down by ISO.

I had little success finding any Microsoft document bringing up any such evidence. Why is this out of the norm difference? Why aren’t we requiring such a document from the initiator ECMA/Microsoft?

Here is a forum post of mine, putting the same question:
http://www.noooxml.org/forum/t-48601#post-129997

Reply
Gerardo Tasistro says

2008/03/24 at 10:34 am

@ TomS, sorry about that. My apologies I was too quick to pass judgment. Your post sounded too much like the “intellectual brilliance” and “Vulcan logic” emerging from Redmond these days. It actually looked like a true Microsoft sponsored post.

I totally agree with your comment. It is something I never actually considered as an option. Maybe because by default I believe Microsoft will want to go its own way rather than share the path.

Heck it would be so much easier if they did use ODF. On top of that they already have a pretty good MS Office to ODF converter. Its called Open Office. So even that would be covered.

As a side note the ODF (as published May 1 2005) is 706 pages long (minus 30 for index). That makes OOXML about 9 times bigger. Now given Rob’s comment on three different ways to show font color we can think there are three ways to do everything. That still makes OOXML 3 times bigger (9/3=3). Now unless DIS 29500 is published with font size 36, there has got to be something really really wrong about it don’t you think?

Reply
Eduardo Maza says

2008/04/04 at 6:12 pm

How does this analysis compares to other standards already approved by ISO or any other organization?

Reply
Anonymous says

2008/04/05 at 7:52 am

Now that Microsoft has all their people in place at MS-ISO, they can proceed apace.

All Microsoft products will now be approved as standards.

All non-Microsoft software standards can now be repealed.

Reply

Trackbacks

Microsoft bowing to the inevitable, will offer ODF support with Office SP2 — Lawyerist says:

2011/02/11 at 12:11 pm

[…] Microsoft bought and paid for its ISO certification last year. (OOXML is not open; Microsoft merely promises not to sue anyone for using its format.) As an extra layer of protection, perhaps, Microsoft’s published OOXML specification is a mess, and Microsoft Office does not conform to its own spec. […]

Reply

Reader Interactions

Comments

Trackbacks

Leave a Reply Cancel reply