Tool to Analyze Text for Possible Snippets

Subject: Tool to Analyze Text for Possible Snippets
From: Paul Hanson <twer_lists_all -at- hotmail -dot- com>
To: "TechWhirl (techwr-l -at- lists -dot- techwr-l -dot- com)" <techwr-l -at- lists -dot- techwr-l -dot- com>
Date: Thu, 12 Apr 2018 19:59:42 +0000

Hi,

I am looking at 8 different Word documents. The end game for these documents is to import them into my HAT (RoboHelp 2015) and maintain them in HTML. No problem - I know how to do all that.

What I want to pick your brains about is how to determine the frequence of the duplicated text. I know there is duplicate text across the documents because I took the 8 Word documents, inserted each into a single Word document, stripped out the graphics, and sorted the paragraphs.

I ended up with 280 sentences.

Sure, I can visually scan the list and find a sentence like this - "Create and confirm a 4-digit Citrix PIN." - and see that it exists twice. I know I could paste the list of 280 sentences into Excel and remove the rows that are duplicated - that's NOT what I'm looking for.

Instead, I'm looking for something close to this site: https://www.online-utility.org/text/analyzer.jsp, BUT I want to know how many times a sentence exists. For example, I pasted in the 280 sentences and the site came back with this information:
|
Some top phrases containing 8 words (without punctuation marks) Occurrences
configure secure hub configure secure hub configure secure 4
|
However, that text is the following text:
|
Configure Secure Hub
Configure Secure Hub
Configure Secure Hub
Configure Secure Hub
Configure Secure Hub
Configure Secure Hub
|
So what I want to do is paste in the 280 sentences and get a report that "Configure Secure Hub" exists in the list of 280 "6" times.

Have you found an easy way to do this?

The next step, after I figure out how to get the list of duplicated text is to generate .hts files (snippet files that RoboHelp recognizes) so that I can analyze the text outside of RoboHelp, create the .hts files, import the snippets into RoboHelp and then run find and replace actions to replace "Configure Secure Hub" with the reference to the snippet that will store the "Configure Secure Hub" text. I know how to create the snippet file, using a DOS command to "Copy [template.hts file] [name of snippet file]" but have yet to figure out how to get the actual text I want to store in the snippet INTO the snippet without manually pasting the text - Configure Secure Hub - into the snippet... but that's after I figure out to analyze the text automatically to know that "Configure Secure Hub" is repeated 6 times in the 280 sentences.

Paul Hanson
My blog: http://prhmusic.blogspot.com<http://prhmusic.blogspot.com/>
Me Playing Drums: http://prhmusic.blogspot.com/p/videos-of-me-playing-drums.html
Twitter: @prhmusic




^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Visit TechWhirl for the latest on content technology, content strategy and content development | http://techwhirl.com

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You are currently subscribed to TECHWR-L as archive -at- web -dot- techwr-l -dot- com -dot-

To unsubscribe send a blank email to
techwr-l-leave -at- lists -dot- techwr-l -dot- com


Send administrative questions to admin -at- techwr-l -dot- com -dot- Visit
http://www.techwhirl.com/email-discussion-groups/ for more resources and info.

Looking for articles on Technical Communications? Head over to our online magazine at http://techwhirl.com

Looking for the archived Techwr-l email discussions? Search our public email archives @ http://techwr-l.com/archives


Follow-Ups:

Previous by Author: Re: Numbers or Letters when labeling a graphic?
Next by Author: RE: Screen capture and simple graphic editing tool. And finding our which tool was used to create a graphic
Previous by Thread: Re: seeking a reliable forum for HTML5/CSS3 discussion
Next by Thread: Re: Tool to Analyze Text for Possible Snippets


What this post helpful? Share it with friends and colleagues:


Sponsored Ads