Re: Structured Document Design for XML or SGML

Subject: Re: Structured Document Design for XML or SGML
From: Dan Emory <danemory -at- primenet -dot- com>
To: Sharon -dot- Kadlec -at- NWA -dot- COM, "Marcus Carr" <mrc -at- allette -dot- com -dot- au>, Chris Despopoulos <cud -at- arrakis -dot- es>, "FrameSGML List" <FrameSGML -at- onelist -dot- com>, "Free Framers" <framers -at- omsys -dot- com>, "TECHWR-L" <techwr-l -at- lists -dot- raycomm -dot- com>
Date: Fri, 19 May 2000 12:46:39 -0700

A QUIZ FOR SGML PURISTS

1. Define "content" as it applies to element naming conventions. Define it in such a way that a clear and unambiguous distinction can be made between a name that conveys content and one that describes a document object such as a paragraph, a text range within a paragraph, a graphic, a table, or a list.
Define it also in such a way that a clear and unambiguous distinction can be made between a name that conveys nothing but content and one that conveys format. Your description shall not include any cant, SGML purist insider jargon, or other escape mechanisms that seek to avoid the many contradictions involved in making a workable DTD that adheres to the SGML purist rule that content must be separated from format.

2. An element named Para has many parents, but context alone cannot determine how the element should be formatted. If defining the formatting parameters in attributes is forbidden, how do you solve this problem so that a style sheet of some sort can produce the correct formatting? If your answer is to use Processing Instructions, explain how that is a better solution than using attributes for the same purpose.

3. If the name Para is forbidden because it describes a document object rather than content, would you change its name to P because that name is "formatting neutral", even though everyone who uses the DTD is supposed to know that P means Paragraph? If that would be your solution, what content information does the name P convey which makes it superior to Para?

4. A List element has the content model (Item, item+). It is used to produce four types of lists:bulleted, arabid-numbered, alpha-numbered, and indented text with no prefix. An attribute name "Type" has a name token group with the permitted values 1, 2, 3, and 4, where each numeric value specifies one of the four list types. In order for authors to properly create such lists, each value must be permanently associated with a particular type. And any style sheet must format the lists according to the attribute-specified type. Replacement of the numeric values with names such as bulleted, arabic, alpha, and indented would eliminate the need for authors to memorize the meaning of each numeric value. Would you refuse to make such a change on the grounds that it would introduce forbidden formatting information into the DTD? If so, please explain why the numeric values are nothing more than a figleaf to conceal the fact that the Type attribute is specifying formatting information, no matter what dinds of values are used.

5. Suppose that the content model of the Item element in the List element above is:
(PCDATA, List?)
which would optionally allow another List to be nested under an item. But suppose further that such nesting is not allowed under the Indented text list type, and that, for the bulleted list type, only nested lists of type Bulleted are allowed to be nested. To make this possible, the content Model for the List element would have to be changed to (Bulleted | Arabic | Alpha | Indented) so that there could be a separate content model for each list type. Also, the Type attribute would be removed from the List element,. since it is no longer needed. Now, with this change, the content model for the Bulleted element would be:
((Item, Bulleted?), (Item, Bulleted?)+
whereas the content model for the Arabic element would be:
((Item, (Numbered | Bulleted | Alpha), (Item, (Numbered | Bulleted | Alpha)+)
Now, you have element names (Numbered, Apha, Bulleted, Indented) which are clearly conveying formatting information. What would you do? Would you change the element names for these list types to Type1, Type2, Type3, Type 4 so as to once again conceal with a figleaf the fact that these elements are describing the forbidden formatting information?

6. Until the CALS table model came along, there was no viable way to describe how to build and display an SGML table. This apparently was because SGML purists could not bear the thought that any workable solution would inevitably introduce formatting into the DTD. The element names in the CALS table model describe document objects, not content, and most of the attributes for each element in the model describe how to format the table. The acceptance of the CALS table model is almost universal. How do you explain this exception, and what makes it different from from many needed exceptions which you reject?

7. The requirements specification for developing a DTD identifiies certain situations where four equally important facets (A, B, C, and D) of content are present, which can appear singly or in any combination. Thus the following facet combinations can occur: A, B, C, D, AB, AC, AD, ABC, ABD, ABCD. Would you create and name an element for each possible combination, or would you create a single element with attributes to describe each facet, where the default for each attribute is no value, or would you do something else?

8.To further elaborate on my statement arguing the need for multiple facets to describe information content, consider the new Resource Description Framework (RDF) in the XML standard, whose purpose is (among others) to facilitate database search and retrieval of information. RDF description patterns are applicable
to individual nodes or elements within documents as well as whole documents. Each RDF includes a Universal Resource Identifier that uniquely specifies
where the resource is located (e.g., within a database, a file, or an element whose ID attribute specifies an absolute or relative Xpointer location term.

RDFs can be created independently, or they can be embedded in the structure of the document, or both. There is no reason that I can think of why this could
not be incorporated into SGML documents as well as in XML ones. It is possible to define many different description patterns, some more elaborate than others. If RDF offers a much better and more comprehensive way to describe information content at any level of structure, do you believe it might moderate the SGML purists' insistence that element names must always describe content? If not, why not?

9. The SGML purist's' claim is that "hardcoding formatting attribute values into the data is wrong and that the application should be responsible for rendering it so that the data can be used with different media. But XML defines a new style sheet standard, XSL. Using middleware, it should be possible to extract XML data from a database, and build a customized style sheet on the fly to fit the requirements of the user (human or non-human) who initiated the database query. If style sheets become dynamically generated doesn't that make the purists' concern irrelevant? Why not hardcode the formatting for the most demanding formatting requirement (e.g., high-quality printed books), and let the middleware either ignore formatting attributes or modify how they are used, depending on the media and the end user?

10. Why do most of the commonly used DTDs (J2008, ISO 12083, Docbook, MIL-M38784, HTML, aand even the ATA DTD ) violate with wild abandon the SGML pusits' view of how a DTD should be built? Is it because the people who developed them just don't get it right, or is it because, pragmatically, the reductionistic viewpoint of the purists is simply impractical in the real world?


====================
| Nullius in Verba |
====================
Dan Emory, Dan Emory & Associates
FrameMaker/FrameMaker+SGML Document Design & Database Publishing
Voice/Fax: 949-722-8971 E-Mail: danemory -at- primenet -dot- com
10044 Adams Ave. #208, Huntington Beach, CA 92646
---Subscribe to the "Free Framers" list by sending a message to
majordomo -at- omsys -dot- com with "subscribe framers" (no quotes) in the body.






Previous by Author: Structured Document Design for XML or SGML
Next by Author: Re: [FrameSGML] Structured Document Design for XML or SGML (Long)
Previous by Thread: Structured Document Design for XML or SGML
Next by Thread: An Engineer has infected my young mind!


What this post helpful? Share it with friends and colleagues:


Sponsored Ads