RE: Looking for classes in indexing

[Author Prev][Author Next][Thread Prev][Thread Next]
[Author Index (this month)][Thread Index (this month)][Top of Archive]

RE: Looking for classes in indexing

Subject: RE: Looking for classes in indexing
From: Slager Timothy J <Timothy -dot- Slager -at- dematic -dot- com>
To: "mbaker -at- analecta -dot- com" <mbaker -at- analecta -dot- com>, 'Jonathan Baker' <jbaker2525 -at- gmail -dot- com>, "dick -at- rlhamilton -dot- net" <dick -at- rlhamilton -dot- net>
Date: Fri, 27 Jul 2018 14:31:48 +0000

Mark,
Be sure to post a link here when you finish that blog post!

-----Original Message-----
From: techwr-l-bounces+timothy -dot- slager=dematic -dot- com -at- lists -dot- techwr-l -dot- com [mailto:techwr-l-bounces+timothy -dot- slager=dematic -dot- com -at- lists -dot- techwr-l -dot- com] On Behalf Of mbaker -at- analecta -dot- com
Sent: Friday, July 27, 2018 9:49 AM
To: 'Jonathan Baker'; dick -at- rlhamilton -dot- net
Cc: techwr-l -at- lists -dot- techwr-l -dot- com
Subject: RE: Looking for classes in indexing

Index automation can only take you so far. For the book I referred to earlier, we experimented with index automation. The book is about structured writing, so naturally we used structured writing techniques. This included the annotation of subjects mentioned in the text, and of the subjects covered in chapters and sections. This is essentially what an index does -- it points you to the places were different subjects are treated in a document. So it follows that we should be able to derive an index from these annotations.

This is already a much more controlled process than using software to scan an unstructured text for keywords, which is all that automated indexing software can do, short of an AI revolution that has not arrived yet. The subject annotations that we used noted the type of the subject and rectified the terminology (for example: {XML}(markup-language "Extensible Markup Language")). This gave us a significant degree of terminology control and allowed us to detect a lot of inconsistency in the book (as Richard mentioned earlier). It also allowed us to automatically create entries for the major types of subject matter discussed in the book:

markup languages
XML, 56, 68, 72
HTML, 345, 403
Markdown, 3, 76, 432

The result was a not bad index, but certainly not as good as Richard wanted. In particular, it did not let us do things like this:

constraints, 27, 228, 367â390
auditing, 431
cost of reuse, 153
data entry, 315
detecting duplication using, 414â415
extensibility and, 334
factoring out, 29, 42, 308
managing reuse, 134
media-domain, 29
personalization, 167
rhetorical, 246
semantic, 312â315
structural, 312â315
types of schema, 389
uniqueness, 174

These types of entries put terms in their narrative context. This requires a human reading of the surrounding text. It can't be done effectively from subject-domain semantic markup and it certainly can't be done reliably (yet) by indexing apps working on unstructured text.

Why is this important? Search engines have two big advantages over indexes (other than their enormous advantage in scope, which I mentioned earlier). First, indexes work on individual terms, while search engines can work with phrases and sentences. You can type an entire question into a search engine and it will use the whole sentence to discern what you are interested in. In other words, you can put your search terms in their narrative context up front by searching on the right phrase.

Second, they have a ranking algorithm that does a remarkably good job (most of the time) at selecting the most relevant entry and putting it at the top of the list. Indexes, by contrast, list pages in numerical order. If you want to get really fancy, you can bold page numbers for the main entries for a subject, but that is not in any way specific to the user's individual query. Search engines not only rank the subject matter statically, they rank it for the known interests of the individual user.

These entries that put terms into their narrative context help indexes partially make up for these deficiencies vis a vis search engines. They can only be created by hand, and Richard felt it was important to do this for the book, particularly in cases where a subject is mentioned many times. An undifferentiated list of 30 page references presents a rather daunting task to the reader. The context setting entries can help them narrow down what they are looking for.

We did not throw out the automated generation of the index altogether, however. Rather, we added markup that allowed us to supplement the generated entries with human created entries (100% of which were created by Richard, who is way better at this sort of thing than I am). As a result of this hybrid approach, we were able to reduce the indexing effort significantly, while still incorporating valuable index features than can only be created by hand.

I'm planning to blog about this and other aspects of the development process for the book sometime soonish.

Mark

> -----Original Message-----
> From: techwr-l-bounces+mbaker=analecta -dot- com -at- lists -dot- techwr-l -dot- com <techwr-
> l-bounces+mbaker=analecta -dot- com -at- lists -dot- techwr-l -dot- com> On Behalf Of
> Jonathan Baker
> Sent: Friday, July 27, 2018 7:22 AM
> To: dick -at- rlhamilton -dot- net
> Cc: techwr-l -at- lists -dot- techwr-l -dot- com
> Subject: Re: Looking for classes in indexing
>
> Iâm not into religious wars, so I wonât go there. However, in following this
> conversation , it occurred to me that there may be some tools out there to
> automate the indexing process. I didnât do a search, but did stumble upon a
> tool called TExtract (texyz.com). I havenât used it, but may the next time I
> need to do an index.
>
> Also, one of the best books about indexing was written by Ruth Canedy
> Cross. Unfortunately, Indexing Books is out of print and only sometimes
> available on Amazon.
>
> Jon
>
> Sent from my iPad
>
> > On Jul 26, 2018, at 5:27 PM, dick -at- rlhamilton -dot- net wrote:
> >
> > I canât speak for Tim, but in my experience, I have uncovered terminology
> problems while indexing a book (e.g., inconsistent use of terminology,
> unnecessary use of synonyms, terms used before they are defined, etc.). It
> also gives you a different perspective on a book, which can reveal problems
> that you missed in editing. While indexing, I have found typos that were
> missed in previous editing passes.
> >
> > But I wouldn't go so far as to say that itâs worth doing if you donât publish
> the index; if I go to the trouble of doing an index, then it will be published:-).
> >
> > Best regards,
> > Richard
> > -------
> > XML Press
> > XML for Technical Communicators
> > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fxmlpress.net&data=01%7C01%7CTimothy.Slager%40dematic.com%7Ce83379b98f3149ceca6808d5f3c7c3c2%7Cb87688c18bb44f8698754da105de8eda%7C0&sdata=WJrAQkOTJ%2FqDtBrgzTFuxopmty7KMWdgvpGD1BMCTTU%3D&reserved=0
> > hamilton -at- xmlpress -dot- net
> >
> >
> >
> >> On Jul 26, 2018, at 14:12, Wright, Lynne <Lynne -dot- Wright -at- Kronos -dot- com>
> wrote:
> >>
> >> Can you elaborate on how indexing is a good way to review your work?
> >>
> >> The docs I used to index were several hundred pages long; it used to take
> me DAYS to compile a properly edited index for them. That's a lot of time to
> spend if you're not actually going to use that index; and I can't think of any
> occasion where it helped me improve the actual content or structure or
> anything.
> >>
> >> -----Original Message-----
> >> From: techwr-l-bounces+lynne -dot- wright=kronos -dot- com -at- lists -dot- techwr-l -dot- com
> >> <techwr-l-bounces+lynne -dot- wright=kronos -dot- com -at- lists -dot- techwr-l -dot- com> On
> >> Behalf Of Slager Timothy J
> >> Sent: Thursday, July 26, 2018 4:52 PM
> >> To: mbaker -at- analecta -dot- com; techwr-l -at- lists -dot- techwr-l -dot- com
> >> Subject: RE: Looking for classes in indexing
> >>
> >> A search engine is an index. In a different form. If you are tagging and
> adding synonyms, you are indexing. I'm all for new formats, but it is still
> indexing (pointing to information).
> >>
> >> A side advantage to an index is that it is an excellent way to review your
> work (or someone else's). The benefits here make it worthwhile even if you
> don't publish the index.
> >>
> >> My 2p. tims
> >>
> >> -----Original Message-----
> >> From: techwr-l-bounces+timothy -dot- slager=dematic -dot- com -at- lists -dot- techwr-l -dot- com
> >> [mailto:techwr-l-bounces+timothy -dot- slager=dematic -dot- com -at- lists -dot- techwr-l -dot- co
> >> m] On Behalf Of mbaker -at- analecta -dot- com
> >> Sent: Wednesday, July 25, 2018 7:20 PM
> >> To: techwr-l -at- lists -dot- techwr-l -dot- com
> >> Subject: RE: Looking for classes in indexing
> >>
> >> I'm old too, but let's face it, indexes are the paper substitute for a search
> engine. Anything an index can do, a decent search engine can do better (yes,
> including synonyms). More to the point, even the old are so habituated to
> search now that the only way they are going to stumble into your index is if it
> shows up in a search results.
> >>
> >> Unless, of course, they actually are reading on paper, because then the
> index is the poor man's search engine, and in that case it better be good,
> because it has a lot to live up to.
> >>
> >> And if there are those out there that still want to claim that indexes are
> better than search engines, here is the clincher: An index only works when
> you have a the right book in your hand. Which means you have to find the
> book before you can use the index. But a search engine searches everything.
> >> The reader does not have to locate the book first. Indeed, they probably
> never know which "book" their results came from. They live in a world of
> pages, not books, and they find pages using search. Every Page is Page One.
> >>
> >> If I was looking for a course to take in this day an age, I would take SEO
> before I took indexing. Unless, of course, I was actually preparing a book for
> publication on paper. (Which, as it happens, I am: Structured Writing:
> >> Rhetoric and Process, real soon now from XML press. I think it has a
> >> pretty good index, most of which is Richard Hamilton's doing.)
> >>
> >> Mark
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Visit TechWhirl for the latest on content technology, content strategy and content development | http://techwhirl.com

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You are currently subscribed to TECHWR-L as archive -at- web -dot- techwr-l -dot- com -dot-

To unsubscribe send a blank email to
techwr-l-leave -at- lists -dot- techwr-l -dot- com

Send administrative questions to admin -at- techwr-l -dot- com -dot- Visit
http://www.techwhirl.com/email-discussion-groups/ for more resources and info.

Looking for articles on Technical Communications? Head over to our online magazine at http://techwhirl.com

Looking for the archived Techwr-l email discussions? Search our public email archives @ http://techwr-l.com/archives

Follow-Ups:

Re: Looking for classes in indexing, Lin Sims

Previous by Author: RE: Looking for classes in indexing
Next by Author: RE: Looking for classes in indexing
Previous by Thread: Re: Looking for classes in indexing
Next by Thread: Re: Looking for classes in indexing

[Top of Archive] | [Author Index (this month)] | [Thread Index (this month)]

TechWhirl Sponsors

About TechWhirl

Latest Articles from TechWhirl