Single sourcing, XML, and DITA

Subject: Single sourcing, XML, and DITA
From: mpriestl -at- ca -dot- ibm -dot- com
To: "TECHWR-L" <techwr-l -at- lists -dot- raycomm -dot- com>
Date: Tue, 11 Sep 2001 16:35:07 -0400

Without responding in detail to the many interesting points that have been
raised in these threads, I'm going to resort to cut and paste :-) from a
SIGDOC paper I'll be presenting in October.

This is some of the meatier info of the paper, where I was trying to cover
some of the various ways that topic-oriented XML (and the separation of
content from context) allows reuse and singlesourcing. Looking back, one of
the main aspects that I didn't cover was progressive disclosure: having a
consistent structure at various levels of detail lets users drill down
quickly to pertinent information, and lets detailed documents be reused in
stripped-down form for media-constrained use (like popups, PDAs, etc.).

This section jumps right in on reuse in DITA in particular, and may be a
little hard to read without some background, which you can get at:

http://www-106.ibm.com/developerworks/xml/library/x-dita1/index.html
http://www-3.ibm.com/ibm/easy/eou_ext.nsf/Publish/1819

If you want the rest of this paper, though, you'll need to get the SIGDOC
proceedings :-/


3. Content reuse
When information (such as concepts, tasks, and reference topics) are
assembled or aggregated into new contexts, there are three things that can
make the process easier:

· Consistently chunked information (all the units you are assembling are
the same size, or of a predictable size)
· Context-free information, that focuses on a single task, idea, or
thing, with as few external references or dependencies as possible
· Automatic inclusion as part of a repeating process, so that there is
only one copy of the source, and when it gets changed all the places that
reuse the source pick up the change without manual intervention

The alternative (manually scanning through rambling documents, then cutting
and pasting the applicable content into the new document you are creating)
is time-intensive, error-prone, and non-scalable in terms of maintenance.
Every time you copy information, you create a new place it must be
maintained. In other words, reusing the information once doubles the cost
of maintaining it; reusing it twice triples it, and so on.

In contrast, automatic inclusion does not increase the cost of maintenance.
Every time you change the original, all contexts pick up the change.
However, it puts considerable pressure on the writer to keep the reusable
content as free from context as possible.

When you include content automatically, there are two standard ways to pick
which content to include:

· As a property of the source that marks it as a candidate for certain
kinds of inclusion
· As a reference from a context document or navigation map, that points
specifically to the content it wants to include

The advantage of property-based selection is that you don't have to add or
maintain a specific reference to the topic from anywhere else: the point of
inclusion only has to identify the criteria for inclusion, it doesn't have
to exhaustively list each of the information units that meets that
criteria.

The disadvantage of property-based selection is that sometimes the
properties themselves don't provide enough information, and you are then
faced with the task of updating the properties of every unit you hope to
reuse. For example, if you had set properties on 1000 units that identified
them as applying to either novice, experienced, or expert users, and then
realize that you need to subdivide those categories further - say into
database administrator, Java programmer, and C++ programmer ? then you are
stuck with updating 1000 files to reflect the new properties for each. For
example, where a task before applied to expert users, you can now clarify
that it is an advanced task for programmers but a simple novice task for
database administrators.

The advantage of context or map-based references is that you can update the
criteria for selection without affecting the units you are selecting. So in
the previous example, instead of updating 1000 files to reflect the
proliferation of properties, you simply expand or even split the one
original map (which distinguished between three types of user) so that it
makes the necessary distinctions (for example, one map for each user role).
Adding new properties, as new needs arise, does not affect the units being
reused, and therefore does not affect others reusing the same units.

The disadvantage of context or map-based references is that they can be
mind-numbingly literal to maintain. For example, if you wanted to include
the documentation for 1000 C++ classes in a reference manual, it seems
pointless to maintain a list of those classes that must be updated each
time a new class gets generated, when you could simply be including based
on whether the class has public or private as its setting.

In practice, there are times when either approach is reasonable. For
example, when you want to include based on a very stable property (such as
one defined by a programming language standard), that's pretty safe. But if
you want to include based on something more volatile (such as audience
analysis), you may be better off maintaining an explicit list or map
instead of properties on each unit.

Theoretically, properties and maps define the same kinds of information,
and one can be transformed into the other as needed. For example, a map can
be derived from properties on a set of units, and properties can be set
based on listings in a set of maps. For many types of properties, it may
be most appropriate to maintain the information in map form (which is the
most maintainable, and keeps the topics as free from context as possible)
and then process the map to set properties in each topic as they are
published for a particular delivery context.

3.1 The continuum of reusability
DITA allows topics to be authored together, as a nested hierarchical
structure, with relationships among different topics, metadata, and various
other features that, strictly speaking, tie the topic to a particular
context. However, the structure of a topic works to compartmentalize the
contextual features, and preserve the reusable elements for easy access.

Topics have the following high-level structure:

<topic>

<title>?</title>

<prolog>?</prolog>

<body>?</body>

<topic>..</topic>

<topic>?</topic>

<topic>

That is, a title, prolog, and body, followed by any number of nested
topics.

The contents of the prolog (largely, metadata and relationships), and the
content after the body (nested topics) are context: they embed the topic in
a structure, they make assumptions about the existence of other topics, and
potentially about the product, platform, and audience to which the topic
applies. All of this matter is subject to change if the topic is reused in
another context: some topics may no longer be available or applicable, the
product and platform may have changed, and the definition of the audience
may need refining.

A context-free topic, created for maximal reuse, may only allow this simple
structure:

<topic>

<title>?</title>

<body>?</body>

</topic>

In other words, no prolog (with its relationships and metadata) and no
nested topics.

This topic can then be combined with information stored for a particular
context in the form of a separate document - a context or navigation map ?
which provides the necessary structures and data to populate the prolog
and, if necessary, assemble topics into larger compound documents.

For any changes that need to be made within the body of a topic, you can
set a variety of properties on any element within the body. These
properties, including audience, platform, and product, allow more
traditional filtering methods to be applied to the content of a topic,
excluding elements when their properties flag them as inappropriate for a
context.

The more you can depend on maps, and on reuse at the topic level, the more
scalable your reuse is. It is often better to add things in (by applying
maps) then to filter things out (based on properties). Given that both maps
and properties are specific to particular contexts, you can add new
contexts with maps simply by defining new maps; but adding new contexts
with properties requires editing each of the affected topics: the
separation of content from context is compromised.

The simple topic, with only title and body and without use of the various
context attributes, represents the most reusable form of DITA content.
However, it is at one end of a continuum: if you are authoring topics for a
more constrained environment, in which the opportunities for reuse are
well-understood in advance, it may very well be appropriate to sacrifice
some of these principles on the altar of pragmatism, and create a more
context-rich topic, adding related links, nesting structures, and metadata
as part of the topic itself, rather than separated out into a separate
context or navigation map.

The way that DITA compartmentalizes its content, with the body holding all
the reusable content of a topic, allows you to easily revisit your reuse
strategy at various points in your documentation lifecycle. For example, it
may be appropriate to author your topics as a single document of nested
topics when you first begin your project: getting information out fast, in
a single context, may be enough of a goal for a first draft, or a beta, or
even a version 1. But in later releases or drafts, as maintenance becomes
more of an issue and your documentation potentially begins to spin off into
different versions for variations of product, audience, platform, or other
issue, you can choose to release your topics from their authoring contexts,
and refactor them into individual topic documents that are related only by
the maps that reference them.


Michael Priestley
DITA Specialization Architect
mpriestl -at- ca -dot- ibm -dot- com
Dept 833 IBM Canada t/l: 969-3233 phone: 905-413-3233
Toronto Information Development


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A landmark hotel, one of America's most beautiful cities, and
three and a half days of immersion in the state of the art:
IPCC 01, Oct. 24-27 in Santa Fe. http://ieeepcs.org/2001/

+++ Miramo -- Database/XML publishing automation. See us at +++
+++ Seybold SFO, Sept. 25-27, in the Adobe Partners Pavilion +++
+++ More info: http://www.axialinfo.com http://www.miramo.com +++

---
You are currently subscribed to techwr-l as: archive -at- raycomm -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- raycomm -dot- com
Send administrative questions to ejray -at- raycomm -dot- com -dot- Visit
http://www.raycomm.com/techwhirl/ for more resources and info.


Previous by Author: Re: Unusual resumes
Next by Author: API Questions
Previous by Thread: OT: attacks and people in our profession
Next by Thread: Shock and Condolences


What this post helpful? Share it with friends and colleagues:


Sponsored Ads