Single sourcing & content management: getting the most out of reu sable text

Subject: Single sourcing & content management: getting the most out of reu sable text
From: HALL Bill <bill -dot- hall -at- tenix -dot- com>
To: "TECHWR-L" <techwr-l -at- lists -dot- raycomm -dot- com>
Date: Thu, 14 Jun 2001 15:07:27 +1000 (EST)

The "real value" debate rises again.

Technical documentation, whether for software or hardware systems typically
involves the production and regurgitation of redundant text. The nirvana we
creative types seek is to "write once and use many times". Structured
authoring and content management systems potentially offer a lot of help,
but as several of our "neo-Luddite" contributors have quite reasonably
observed, the technology can be costly in $$ and labour to implement and it
is very easy to start projects that end up costing much more to implement
and maintain than they can ever return in terms of benefits. Like most
situations where IT technologies are being applied, we can easily get into
deep doo doo if we don't have a clear understanding of what we are trying to
achieve.

A case in point is the US Defense Department's "paperless contracting"
effort. I am particularly interested in this story because I think that most
of the requirements this bespoke system tries to address could be met with
generic XML standards and applications for one to five percent of what DoD
is going to spend on the project (another pitch for LegalXML -
http://www.legalxml.org/contracts). According to the DoD's Inspector
(Auditor) General
(http://www.fcw.com/fcw/articles/2001/0319/pol-dodbuy-03-19-01.asp;
http://www.dodig.osd.mil/audit/reports/fy01/01-075.pdf) this has already
cost more than $US 400 M, for an eventual lifecycle cost from 1995-2005 of
$US 3.7 BN, to save only $US 1.4 BN in savings from increased productivity
and cost reduction (see also http://pd2.ams.com/;
http://www.fcw.com/fcw/articles/2001/0305/web-sps-03-05-01.asp).

Where reducing redundant text is concerned, many difficulties arise from not
being able to clearly define what it is we are trying to do in order to
write once and use many times. To get the best payback from reducing
redundancy, you need to understand there are several different methodologies
you can use, depending on the circumstances. Following is an essay I put
together to help us clarify these kinds of issues. Examples are based on
Tenix's documentation requirements for the ANZAC ships. I am sure that
similar examples could be drawn from many software documentation projects.

The essay assumes that documents are authored in a structured environment
(e.g., SGML or XML) under control of a DTD. With substantial effort to
develop bespoke controls some of the ideas may also be applicable to a word
processing environment using merge printing type functions or unstructured
FrameMaker with conditional text (as demonstrated by Hedley Finger).
However, our considered opinion after several years experience with both
kinds of systems is that essentially all new work for a new project would be
authored in structured environments by comparison to any word processing
system we have used to date. This is even though DTD development is a
significant up-front cost, and without additional savings that can be
achieved through text reuse.

--------

1. What Is Redundant Text and Why Does it Need to Be Managed?

Engineering project documentation often contains large amounts of redundant
information both within single documents and across several to many
documents. There are a number of reasons for this, which create problems if
they are not managed, and which lead to substantial labour savings if they
are managed appropriately.

The redundancy arises because the overall product delivered by a project is
normally broken down into a number of hierarchically organised systems,
subsystems and components for analysis, design, manufacture and support.
Most documentation focuses on the system, subsystem and/or component level.
The requirements for documents for each kind of element at a given
hierarchical level tend to be common across the entire system or project,
and consequently documents have structures and elements of textual content
which will be repeated many times - whether the documents are design
studies, contracts, or maintenance procedures.

Other sources of redundancy are due to the flow down of information
components from project or system level documents into the sub-system and
component level documents; or historically from design studies through
tenders, contracts to documentation deliverables.

There are at least two kinds of redundancy: identical and situational.
Identically redundant information is exactly that. The text or other
information is and should be identical wherever it occurs (e.g., most common
warnings, cautions and notes). In this case, variants should be recognised
and eliminated. Situationally redundant information refers to those many
cases where the structure and most aspects of the content are intended to be
identical but where some aspects of the information must reflect surrounding
circumstances (e.g., precedent based clauses in a contract which need to
follow the precedent but must also appropriately refer to the parties and
other circumstances of the particular contract.)

Goals for managing this redundancy will be to ensure that text that is
supposed to be identical always is identical and that in the case of
situationally redundant information that the situational variables have been
appropriately dealt with.

The management goals and requirements to reduce labour can be met if the
redundant information can be "normalised" in the sense of reducing redundant
texts to single locations that can be authored once and managed once for use
in multiple documents. For this normalisation to be possible, an authoring
environment will benefit from having the following capabilities:

o Facility to identify similar texts (perhaps only a simple query facility)

o Ability for many documents to share text from a single location

o Version and history management (identify families of texts derived from a
common ancestor)

o Ability to include variable elements within standard texts.


2. Management Methodologies

Three kinds of methodologies can be implemented to reduce or eliminate
redundancy within and across documents to allow the information to be
managed at a single location.

The "Master Document" approach is particularly appropriate for dealing with
situations where there are language and configuration variants for what
would otherwise be considered to be a single document (e.g., maintenance
routines for self-contained air conditioning units, where the one procedure
applies to all units, but different configuration identifiers are required
because of minor variations in design, and where the one document must
suffice for two navies).

The "Virtual Document" approach may be applied to any document that contains
elements of reusable information (e.g., the same warning may be used in a
technical manual, maintenance routine, or even a technical repair
specification).

SGML/XML allows "Reusable Entities" to be created for any common element. If
these are appropriately defined and managed, they can be included in any
document where required.

2.1 "Master Document"

2.1.1 Summary

All variant information held in a single "master" document. Because there is
only a single document there is no redundant text. Variant paragraphs in the
linear structure of the master document are held side-by-side in
configuration managed elements.

Individualised deliverables are resolved by an output process in the
document management environment (e.g., resolution of language differences)
or a configuration management process in the end-user's environment (e.g.,
applicability to specific ships determined within the AMPS system).

This is essentially the model currently used for the ANZAC Ship maintenance
requirement cards - and the solution has provided a highly successful and
cost-effective solution for this environment.

2.1.2 Advantages

Only one master document is required to be configuration managed. This
appears to be the most practical model for dealing with multiple language
requirements or for minor configuration related variants. [This may also be
a suitable approach for different output formats - e.g., printed vs on-line
versions].

2.1.3 Limitations

Master documents do not address redundancy across different master documents
or serial redundancies within the one master document. All outputs must have
an identical sequential structure able to accommodate the variant elements.

2.1.4 Disadvantages

Substantial bespoke programming is required to resolve the single master
document into the range of deliverables required. This programming will in
most cases be specific to a particular document type and cannot readily be
reused. Care needs to be taken when building processes to extract
quantitative information from the documents for logistic analysis purposes
to ensure that all configuration related issues are appropriately resolved.

2.2 "Virtual Documents with Shared Elements"

2.2.1 Summary

A configuration managed "shell" document is maintained for each variant
deliverable. This shell contains all unique content belonging directly to
the variant deliverable. For information that is redundant across two or
more shells or between two or more locations within the one shell, the
information is held and indexed in one location and all other locations
which use that information simply point to the indexed location.

If the documentation management system has the appropriate functionality,
both virtual and master concepts can be applied to the same body of
documentation (e.g., multi-language elements may be held within virtual
documents).

2.2.2 Advantages

Except for the requirement to maintain a shell structure for each
deliverable document, the virtual document approach has the capacity to
eliminate all textual redundancy within the level of complete elements. This
greatly reduces the volume of text required to be managed and guarantees
standardisation (all uses of the element point back to the single unique
existence of that element). Quantitative extracts make much more sense -
from the extract process's point of view, each document extracted is a
complete instance of that document type. A given element is only present
once in the document.

2.2.3 Limitations

Care needs to be taken to ensure that the reused text is situationally
correct for hierarchy and document specifics. (Note: in many cases a
production tool can provide correct variable information, such as paragraph
numbers and cross reference details within the reusable elements.)

2.2.4 Disadvantages

Although the functionality is generic to all kinds of documents under
management, virtual document methodologies are entirely dependent on
capabilities of a sophisticated content management application to locate,
manage and process the shared texts. Also, by comparison to a master
document, where the single master may replace many individual documents, the
virtual document approach requires a separate shell document to be
established and managed for each document instance to be delivered. Where
documents contain complex structures but little text in each structural
element, storage requirements for shells may be significant.

2.3 "Reusable Entities"

2.3.1 Summary

The standards for SGML/XML define the concept of an "entity" which is any
kind of object able to be defined and reused by an application processing
instances of the document conforming to a particular document type
definition (DTD). Entities may be defined within the particular DTD, or the
DTD may reference external files defining entities able to be used within
documents conforming to the DTD.

A specific entity may be defined for each bit of reusable text, ranging from
a complex element containing a number of subelements, down to a single
foreign character or symbol. Once defined, these reusable entities are then
allowed to be used anywhere in the structure of a document instance where it
makes sense to do so.

2.3.2 Advantages

Reusable entities are totally conformant to the rules of SGML/XML and
require no special database or output processing functions to resolve them
in the production of a deliverable document or instance. Entities may
provide an appropriate way to manage texts which ideally should be managed
outside of the normal versioning controls applied to a complete document
instance (e.g., it may be appropriate to manage the contents of a particular
warning independently from the version status of a document containing that
entity.)

2.3.3 Limitations

Entities must be defined and established according to the rules of the
particular DTD before they can be used in any particular document instance.
Contents of the entity are not parsed at the time of creation or entry into
a document instance. Also, unless changes to the content of a particular
entity are specifically notified by special management processes, these
changes are not likely to be recognised by the document configuration
management system as changes to document instances using that entity.

2.3.4 Disadvantages

Entities must be defined independently of a document instance before they
can be used. This is potentially labour intensive and it will require those
defining and establishing the entities to have a thorough understanding of
what they are doing. Authors must also have a good understanding of what
entities have been established and how to incorporate them in the text of a
particular document instance.

Although methodologies for reusing entities may be very useful for managing
some kinds of highly repetitive texts (e.g., standard warnings, cautions and
notes), this will probably require a separate change management methodology
to be established from that applicable to the remainder of the
documentation.


3. Considerations

The three methodologies for managing redundant texts are not exclusive, and
in fact all three can potentially be used in the same document type to
better eliminate and manage otherwise redundant information.

3.1 Master Documents

Master document concepts have already been implemented in the content
management system for the maintenance requirement card records, and in fact
were essential to address the requirements of the electronic deliverable.
However, the currently developed facilities are unique to this particular
document type. Because of the magnitude of the programming required to
establish these kinds of master documents, it is probably not immediately
appropriate to apply master document methodologies to other classes of
documentation likely to be taken up into the SIM environment.

Over the longer term, it would be useful to develop a generic capability for
dual/multi elements identified by applicability and effectivity, similar in
concept to that used for dual languages. This would be useful (though
probably not cost justified) for ANZAC technical manuals complying to the
Australian Defence standards. It may be essential for other fleet
maintenance documentation opportunities (e.g., airframes and vehicles) where
there are many more configuration variants to manage within a generic
document structure.

3.2 Virtual Documents with Reusable Elements

The only methodology applicable to all kinds of redundancy at the element
level (whether compound or simple) is to provide a generic capability to
enable the reuse of otherwise redundant elements. So long as the content
within the element is the same, it may be used anywhere that content is
required in any document. As such, this should be a core function provided
by the content management system since it will facilitate normalisation of
content across any arbitrary DTD.

Note that this functionality would also be useful in such specialised
circumstances as a contract management system, where situationally variable
texts could be managed as shared entities without disrupting the reusability
of elements containing the entities which were otherwise redundant.

3.3 Reusable Entities

Because they are an integral part of the SGML/XML languages, reusable text
entities can be implemented manually, and should need only very slight
changes to existing DTDs and viewers in the SIM environment. It may also be
possible to bring entities under a simple form of version control within any
content management system offering an object repository function as normally
provided for graphics entities.

Entities would be entirely appropriate for managing stock-standard texts
such as warnings, cautions and notes, and for dealing with situationally
variable information within otherwise redundant elements managed within
virtual documents.

It may also be possible that an entity-based architecture could be
generalised to allow the identification and management (as entities) of
texts at arbitrary paragraph levels in a document. It is also possible that
the use of entities as a work-around for generically reusable elements might
conflict with the use of entities for situational variables - which I
believe to be a more appropriate use of the capability.


Note to Techwhirlers: I am aware of at least one low-cost (i.e., free to
small teams!) XML content management application that uses entities as the
basis for managing and reusing elements (i.e., to provide "Virtual Documents
with Shared Elements" type management) - see SiberLogic's SiberSafe:
http://www.siberlogic.com/. From the specs, this newly released product
looks like it ought to work for moderate documentation bases such as a
couple of our technical manuals, but we haven't actually tested it yet
because we are still working to SGML standards and the application will
require a processing interface to be developed to tweak the SGML into
acceptable XML for management and back again to XML for delivery and we
still haven't done our sums to decide whether the tweak would be
cost-effective against the small amount of work left on this documentation
set.


Bill Hall
Documentation Systems Specialist
Data Quality
Quality Control and Commissioning
Tenix ANZAC Ship Project
Williamstown, Vic. 3016 AUSTRALIA
Email: bill -dot- hall -at- tenix -dot- com
URL: http://www.tenix.com/

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

*** Deva(tm) Tools for Dreamweaver and Deva(tm) Search ***
Build Contents, Indexes, and Search for Web Sites and Help Systems
Available now at http://www.devahelp.com or info -at- devahelp -dot- com

Sponsored by Cub Lea, specialist in low-cost outsourced development
and documentation. Overload and time-sensitive jobs at exceptional
rates. Unique free gifts for all visitors to http://www.cublea.com

---
You are currently subscribed to techwr-l as: archive -at- raycomm -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- raycomm -dot- com
Send administrative questions to ejray -at- raycomm -dot- com -dot- Visit
http://www.raycomm.com/techwhirl/ for more resources and info.


Previous by Author: RE: advice for single-sourcing ( Framemaker + Webworks)
Next by Author: Length of service--how often to change (take II)
Previous by Thread: Free photo/illustration/etc resource
Next by Thread: Re: Another Interview Thread


What this post helpful? Share it with friends and colleagues:


Sponsored Ads