Machine Translation handling differing contexts: (was OT: Spanish for Reference Manual)

Subject: Machine Translation handling differing contexts: (was OT: Spanish for Reference Manual)
From: Jeff Allen <jeff -at- multilingual -dot- com>
To: techwr-l -at- lists -dot- raycomm -dot- com
Date: Thu, 15 Jan 2004 01:56:19 +0100

The statement in the message below about Machine Translation (MT) systems not
being able to handle differing contexts is not quite entirely true. Take for
example the customized version of the KANT system that was deployed in the
mid-90s at the Caterpillar Technical Information Division site in Peoria,
Illinois (USA). Lots of papers on this at:
http://www.lti.cs.cmu.edu/Research/Kant/

Its original design, coupled with Caterpillar Technical English (CTE),
specifically focused on reducing syntactic ambiguity. This was followed by
adding the Caterpillar Translation Memory Tool (TMT) module which aimed at using
streamlining the authoring/translation Workflow system with the combined TM and
MT output to avoid retranslating by MT what had already been entered into the
overall system. The TMT tool is briefly mentioned in:

RINTANEN, Kirsi and Jost ZETZSCHE. Caterpillar uses Déjà Vu and authoring tools
to create localized manuals, In MultiLingual Computing & Technology, #50 Volume
13 Issue 6.
http://www.multilingual.com/

and:

ALLEN, Jeffrey. 1999. Adapting the concept of "Translation Memory" to "Authoring
memory" for a Controlled Language writing environment. Presented at the 21st
Conference of "Translating and the Computer"
(Available on-line at: http://www.transref.org/u-articles/allen2.asp)

Automatic disambiguity guessing based on additional analyses of legacy and
reauthored texts appeared in this system in the late 90s. Again, see papers at:
http://www.lti.cs.cmu.edu/Research/Kant/

And then the next level was adding statistical analysis as described in
"Improvement of French generation for the KANT machine translation system" by
Eric Crestan.
(http://www.mail-archive.com/mt-list%40eamt.org/msg00259.html)

As for titles, subtitles and references in that overall authoring and
translation workflow, although authors and translators leave out the indefinite
(eg, a, an, any, some) or definite articles (eg, the) in the title and subtitle
information elements (which are independent entities), it is a little more
difficult with reference id SGML tags that refer to manual titles in the middle
of a sentence. Although authors should in theory put the definite or indefinite
article in the paragraph text (and thus outside of the reference id tag), it is
also easy for the (in)definite article to slip inside of the reference id tag.
This can not be seen when the show-tags feature is enabled. For easier reading,
when technical authors put the SGML editor in Hide-tags mode, that little
article can easily find its way inside the reference id tag, and then the MT
system will of course translate the article.
This is an author and translator training issue, and is not simply a system problem.

As for commercially marketed MT products, several current products (professional
and expert versions) have specific features and configurable algorithms for
dealing with contextual relations.

The reason for the translation of 'el manual de la referencia' for English
source text 'Reference manual' in the Transcend MT system at freetranslation.com
is obviously simply due to the fact that the term reference manual is not in the
system's dictionary. For the majority of commercial packages, it would take
30-60 seconds to enter it into the user dictionary as a technical term along
with the target translation.

See MT software reviews at:

SYSTRAN MT systems:
http://www.multilingual.com/allen58.htm
http://www.multilingual.com/wassmer58.htm

Reverso MT systems:
http://www.multilingual.com/allen50.htm

PROMT MT systems:
ALLEN, Jeffrey and Thomas WASSMER. (in preparation). Review of @promt Standard,
@promt Professional and @promt Expert machine translation software. Scheduled
for publication in Multilingual Computing and Technology magazine, Number 62,
March 2004.

Efforts have been made to improve MT systems by allowing them to learn from
existing databases of MT translated and postedited texts:

ALLEN, Jeffrey and Christopher HOGAN. 2000. Toward the development of a
post-editing module for Machine Translation raw output: a new productivity tool
for processing controlled language. Presented at the Third International
Controlled Language Applications Workshop (CLAW2000).
Listed at: http://www.up.univ-mrs.fr/~veronis/claw2000/
Available at: http://www.controlled-language.org

And more recently, some publications and presentations have appeared and been
made that provide explanations of the different approaches of using MT systems
and practical productivity issues concerning their use.

ALLEN, Jeffrey. 2003. Post-editing. In Computers and Translation: A Translators
Guide. Edited by Harold Somers. Benjamins Translation Library, 35. Amsterdam:
John Benjamins. (ISBN 90 272 1640 1).
(information available at
http://www.benjamins.com/cgi-bin/t_bookview.cgi?bookid=BTL_35)

ALLEN, Jeffrey. 2001. Postediting: an integrated part of a translation software
program. In Language International magazine, April 2001, Vol. 13, No. 2, pp. 26-29.

ALLEN, Jeffrey. 2003. Tutorial on Machine Translation Postediting. Presented at
European Association for Machine Translation and Controlled Language
Applications Workshop (EAMT/CLAW2003). 17 May 2003. Dublin City University,
Ireland.

All of the above-mentioned publications have reference sections that point to
other publications and conference presentations by other authors on the same topics.

Regards,

Jeff Allen
Editorial Advisory Board, MultiLingual Computing & Technology
http://www.multilingual.com/editorialBoard/
jeff -at- multilingual -dot- com or jeff -dot- allen -at- free -dot- fr
-------------

Subject: Re: OT: Spanish for Reference Manual
OBrien_David_P -at- cat -dot- com
Date: Fri, 22 Aug 2003 10:07:25 +1000
wrote:

Not for a title. If you were to ask Where is the reference manual or
otherwise refer to the book specifically, then you would include the article
for Manual, but not for Referencia. Manual de Referencia is correct for a
title.
Machine translation services generally don't handle differing contexts
successfully.


22 Ago 2003 09:58, Goober Writer wrote:
According to http://www.freetranslation.com/ it's el manual de la referencia and
my faint recollection of high school spanish seems to agree.


Karen Casemier wrote:
Can anyone tell me the Spanish equivalent of Reference Manual? I've had to
format a Spanish manual (don't ask, long story) and realized I have no idea
what to put on the cover page. This for a software product, and I refer to
it as a reference manual vs. a user manual because it is basically a list of
all dialog boxes and fields, not a list of step-by-step instructions. Audience
is mainly Mexico and Latin America.




Previous by Author: Re: RoboHelp and documentation plans
Next by Author: Re: Best technical writing of the season
Previous by Thread: Re: A survey on Technical Communicators on Cross-Functional Teams
Next by Thread: Cross-reference style in Word


What this post helpful? Share it with friends and colleagues:


Sponsored Ads