Introduction
There are several words, more widely used than understood, that recur frequently when discussing standards. Specification and standardization requires us precisely to describe technology in such a way that practitioners in that field can achieve the goals set out in the standard. But this precision is only perfectly intelligible to those who share the same code words. What follows is a handful of the more important ones, what they mean, and how they are unintentionally confused or intentionally misused. You are at a distinct disadvantage when reading (or writing) a news article, a blog post, or evaluating an argument if you do not know the correct meaning of the following words.
Standard
Take the definition from ISO/IEC Guide 2:2004, definition 3.2:
[A] document, established by consensus and approved by a recognized body, that provides, for common and repeated use, rules, guidelines or characteristics for activities or their results, aimed at the achievement of the optimum degree of order in a given context.
NOTE Standards should be based on the consolidated results of science, technology and experience, and aimed at the promotion of optimum community benefits.
So, it is a document, a written description, not an embodiment in the form of a product, that is standardized. Its aims are the “achievement of optimum degree of order” and “promotion of optimum community benefits”, and it is achieved through consensus and consolidation.
international standard
According to ISO/IEC Guide 2:2004, definition 3.2.1.1:
[A] standard that is adopted by an international standardizing/standards organization and made available to the public.
International Standard
[An] international standard where the international standards organization is ISO or IEC
Note the distinction. With capital letters only ISO or IEC standards apply. With lowercase, other standards are included. This is a bit self-serving. ISO and IEC Standards are the only International Standards, because ISO says so. Sorry ITU, sorry CEN, sorry W3C.
So think of “International Standards” as a controlled mark of ISO, like “parmigiano reggiano” is a controlled mark of the Northern Italian Cheese Consorzio.
Normative
The normative parts of a standard are those which set out the scope and provisions of the standard. See ISO Directives, Part 2, section 3.8.
Provisions
The provisions of a standard consist of:
- Requirements that must be met for conformance
- Recommendations
- Statements of permissible, possible or capable actions
See ISO Directives, Part 2, section 3.12.
Note that standards have specific words which denote and distinguish requirements, recommendations and capabilities. Different standards organizations have different vocabulary for this, so a W3C Recommendation, an IETF RFC and an ISO Standard may have different ways of stating the same provision. For ISO Standards, the conventions are:
- “shall” and “shall not” are the normal terms for expressing requirements.
- “should” and “should not” are the normal terms for expressing recommendations.
- “may” and “need not” are the normal terms for expressing permission.
- “can” and “cannot” are the normal terms for expressing possibility and capability.
This is necessary because of the extreme ambiguity of the English language in the area of modality. Consider the following sentences, using the word “must”:
- (On hearing the doorbell ring), “Oh, that must be the mailman!” [expressing likelihood]
- (To a misbehaving child) “You must obey your mother” [expressing obligation]
Or the following exchange with a teenage daughter:
- Teen: “I shall return by 11pm” [simple future]
- Parent: “No, you shall return by 10pm” [expressing a command]
We can be loose and still be understood, in context, in normal conversation, but in standards work we try to be precise and uniform in the use of our control vocabulary.
Conformance
This simply is a question of whether something meets the requirements of the standard. However, for many standards, there are multiple levels, perhaps even multiple classes of conformance. So you need to be very specific about what you are saying.
For example, you should not ask “Does Excel 2007 conform to OOXML?” You should ask “Is Excel 2007 a conforming transitional class SpreadsheetML Producer?” If you count it all up, OOXML probably has at least 18 distinct conformance classes, by various combinations of applications, documents, readers/writers and transitional/strict conformance classes.
Not in particular that conformance does not mean that an application implements the entire standard.
[My definition above is not very satisfactory. Anyone have something better? Is there an ISO definition of conformance?]
Compliance
This is not a typical standards term. The more typical term is “conformance”. Best to avoid it unless you are talking in regulatory or legal context. See ISO Directives, Part 2, section 6.6.1.1:
A document does not in itself impose any obligation upon anyone to follow it. However, such an obligation may be imposed, for example, by legislation or by a contract. In order to be able to claim compliance with a document, the user needs to be able to identify the requirements he/she is obliged to satisfy. The user also needs to be able to distinguish these requirements from other provisions where there is a certain freedom of choice.
Validity
This is an XML term, referring to the relationship between an XML document instance (an XML file) and a schema (the definition of the syntax of the markup language). Generally, an XML document instance is valid if it adheres to the constraints defined in the schema. The precise definition of validity will depend on the schema definition language used.
I’d welcome any suggestions for other words or definitions that should be included here.
I like your approach here, and the separation of conformance and compliance is very useful.
When I saw the exchanges between you and Alex Brown on “validity” something kept nagging at me about how validity applies with respect to XML Schema. (I notice you point out that different notions apply to different schema schemes, but then I wonder what it means when multiple flavors are provided or schema translations are used.]
First, I notice that XML 1.0 has been pretty consistent about validity of documents and it is related to the presence of a DTD for the most part. There are “validity constraints” throughout the XML specification (e.g., section 3.1.1) and while they can all be considered syntactical/grammatical rules, they go beyond treating the DTD solely as a grammar of the context-free variety.
Without getting into how different schema systems might grandfather some of those constraints, absent any document type declaration, I revisited the XML Schema specification and found what had been nagging at me.
In the specification of XML Schema, there is great care to speak of schema [relative] validation of XML instance documents. The suggestion of document validity is avoided. There is also a preference for “schema-validity assessment” [Section 2.1 of XML Schema Part 1: Structures].
Is Relax NG any different?
Are you talking about conformance of a schema file itself to the underlying schema definition language?
Relax NG defines conformance for validators. It does not define conformance of schemas. Of course Relax NG has a schema for itself, a Relax NG schema, of course. The ODF 1.0 schema is valid Relax NG, relative to the Relax NG schema.
I think it might also be useful to define some elements of intent, don’t you think?
For instance, to the extent that OO.o, Google Docs, and IBM Symphony product documents with ODF file extensions that are not perfectly conformant to the ODF 1.1 specification, what is the intent? Are these bugs in the software that must be reported and fixed? And does this mean that when any of those platforms produce files with ODF extensions that ARE conformant, that they developers are complying with the spec?
Similarly, looking Office 2007’s OOXML output and comparing it to the DIS29500 spec — you can’t even call non-conformance a bug, can you? The developers of Office 2007 weren’t trying to adhere to that specification because it didn’t exist yet (and still doesn’t.) At such time as the spec actually exists, and Microsoft releases a version that claims to write it, then there’s something to be evaluated on whether the software has a bug.
I say this because Alex’s “smoke tests” have been for the most part mystifying to me. What’s he trying to test? Whether it’s possible to get output matching a spec that doesn’t exist yet from a product released 18 months ago? Whether it’s possible to get output from one piece of software that isn’t conformant to a given output format? Whether OO.o has bugs? Whether Office 2007 has bugs?
This is software. They both have bugs. Was that ever in question?
http://www.askoxford.com/concise_oed/shall?view=uk
“Strictly speaking shall should be used with I and we to form the future tense, as in I shall be late, while will should be used with you, he, she, it, and they, as in she will not be there. This, however, is reversed when strong determination is being expressed, as in I will not tolerate this , and you shall go to school. In speech the distinction tends to be obscured, through the use of the contracted forms I’ll, she’ll, etc”
@rob: I was thinking of the “validity” of instance documents, not of the schema itself as a document.
I notice that the XML 1.0 “validity constraints” do address the validity of a document type declaration (DTD) as well as the validity of XML 1.0 documents and the document type declarations, if any, that apply to them.
But I was sticking to teasing out “validity” with respect to document instances (not their schemas) and how schema-validity assessment is used in that context (for XML Schema, at least).
To sharpen my question: What is the language of the Relax NG specification with regard to what is said about a document that accepted by a proper validator using a particular (valid?) Relax NG Schema.
If there is some sort of validation condition on Relax NG Schemas themselves, what is that called, and does it involve more than the schema being accepted by a Relax NG Validator applying the Relax NG Schema?
[Yes, I am being lazy. I am not sure I have my hands on the correct Relax NG spec. This seems like an useful conversation and clarification in any case. I have to dig into this for my nfoWorks project eventually, but just not yet.]
Rob, the definition you give for “validity” is the one that leaves the most room for variation, and it also seems to be the key point of disagreement between you and Alex regarding validation of ODF documents. You quote Relax NG 3.25 “a member of the set of XML documents described by the schema”. Alex says that by definition if membership in the set is not computable then “there is no set” to validate against. That’s the crux, and the question I would have for Alex is if there is some specification that says that the set must be computable, or is he basing his assertion on common sense. If the latter, perhaps it seems like common sense that a set is not a set if you can’t decide if something is or is not a member of the set. But computability theory defines and works with computable sets, recursively enumerable sets, and noncomputable sets. There is no fundamental reason to reject the latter two out of hand. So it comes down to the question of whether there is a definition of “validity” in some formal specification that applies to ODF that does specify constraints on the computability of the set of valid documents.
Sidney, there is nothing stated in the Relax NG specification that answers your question. Nothing is stated that would give the expectation that the word ‘set’ is used in a strict Cantorian sense either.
But note that a set does not need to be computable in order to test membership. For example, take the set consisting of the following numbers:
1,2,3, the number of regular season wins for the Red Sox in 2008, and the number of twin primes.
Is 1 a member of the set? Yes, obviously.
Is 95 a member of the set? This is unknown. 95 is not one of the integers explicitly included. And we know that there are far more than 95 twin primes. But we will not know until the season ends whether the Red Sox will win exactly 95 games.
Is 5 a member of the set? No. We can say that with certainty, because we know that the Red Sox have already won 23 games so far this season, and there are more than 5 twin primes.