TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Re: Structured HTML (was: Is it possible to single-source online in HTML?)
Subject:Re: Structured HTML (was: Is it possible to single-source online in HTML?) From:Mark Baker <mbaker -at- OMNIMARK -dot- COM> Date:Wed, 12 May 1999 10:31:47 -0400
Simon North wrote
>Sorry, I can't let this go unchallenged. HTML (and Word) are
>inherently unstructured.
Well, like I said, the misconception is common. ;->
HTML and Word are both highly structured. They structure information for
presentation and they both do it well. Presentation structure is structure,
just as much as any other kind of structure. Many members of the SGML/XML
community
like to treat any structure oriented to presentation as if it were not
structure, because they advocate the exclusive use of non-presentation forms
of structure, I also advocate the use of such non-presentation oriented
structures when they are necessary and useful. But that doesn't mean the
presentation structure is not structure or that you cannot usefully use and
process presentation structure.
>The HTML DTD does not enforce any
>particular element hierarchy, in the same way as Word does not
>enforce the use of styles in any particular order.
Structure is not synonymous either with hierarchy or strictness. Strict
hierarchical structures are appropriate for some purposes. Flat flexible
structures are appropriate for some purposes. Corporations are learning this
lesson. Technical communicators should too! Both Word and HTML have good
flexible and relatively flat structures for describing the presentation of
text. As long as the processing you want to do can work with those
structures they are adequate for your purposes. There are no inherently good
structures or inherently bad structures. There are only structures which are
or are not good enough for what you need to do. In some, but not all, cases
HTML will be good enough.
>
>I think the point that Mark was trying to make is that HTML (and
>Word) can be used in a structured manner.
No, that is not what I am saying. I am saying that they are inherently
structured to express presentation. It is also true, of course, that you can
infer some things about the rhetorical structure of a document from the way
it is formatted, and this is also useful. The fact that you can do so simply
bolsters the argument that HTML is sometimes as much structure as you need.
>It goes further than just the correct use of element tags.
>For example, I am currently using ID attributes on HTML tags to
>approximate the kind of structural tagging that I would rather use
>XML for, but cannot yet achieve due to the lack of tool support.
Why would you rather use XML if you can do the task adequately using HTML?
By all means create a new language if an existing language really won't do.
But why reinvent the wheel if it will?
>The problem comes back to a basic question of "information
>entropy'. If there is enough 'information' in the original format,
>conversion to a lesser/different format is easy. Going from rich
>(complex, such as SGML) 'downwards' to HTML (or ASCII) is easy;
>coming back up is hard, and often requires human intervention.
The is a temptation to think of text conversion like JPEG compression.
Beyond a certain point it must be lossy. This is not true. Text conversion
is usually "gainey". That is, the conversion process adds new structure.
SGML to HTML conversion, for instance, is usually both a lossy and a gainy
conversion. Structure not relating to this presentation is thrown away.
Structure relating to presentation is added. An HTML to Word transformation
would also be both lossy and gainey. Nested tables would be lost. Pagination
would be added. The transformation process is an active one. It adds things
that did not exist before.
>OmniMark is an excellent tool, it has few equals. However, I'd add
>my own word of caution. The process is balanced by the quality of
>the input and the power of the tools. Choosing a powerful tool at
>the cost of input quality forces you to be dependent on the tool and
>may indeed press you to change your working to accommodate
>the tool (in my eyes the worst possible sin you could commit).
Setting aside that OmniMark, as a full programming language is one of the
few tools that does not restrict your choices of input, I have to point out
that our choices are always limited by the availability and quality of our
tools. If you cannot provide an adequate authoring environment that writers
can use easily and correctly, you've got nothing. To insist on a supposedly
pure form of input when you cannot, for lack of tools, actually produce any
input of that format, is pointless.
Having said that let me confess that
we use a home grown database system which makes extensive use of SGML in
database fields to enable collaborative authoring and single sourcing of
multiple output formats. It
works very well, but we had to custom build it. This is worthwhile for us,
but would not necessarily be so for everyone.
I am not particularly pro HTML or anti XML. I simply for working solution
rather than theoretical ones.
---
Mark Baker
Manager, Technical Communication
OmniMark Technologies Corporation
1400 Blair Place
Gloucester, Ontario
Canada, K1J 9B8
Phone: 613-745-4242
Fax: 613-745-5560
Email mbaker -at- omnimark -dot- com
Web: http://www.omnimark.com