Dr J R Stockton
2/29/2016 11:15:00 PM
I have a reference to the body element of a local Web page, and can
assume that body.onload() has finished. I also have a RegExp, which has
been defined from the value of an input type=text element.
I want to apply that RegExp to the whole displayed text, all at once or
piecemeal, and get all of the matches. I have been using the match
method on body.innerText, body.innerHTML, or body.textContent, which was
good enough to do what I wanted, but not ideal.
For example, the text up<br>on in the HTML source must be treated as the
two words "up on" and not the one word "upon". And, if practical,
"câm" should be treated as a three-letter word.
The immediate aim is to use something like /\b[A-Z]{4,}\b/gi to find all
upper-case "word"s of four or more letters, in order to discover most of
the acronyms without too many false positives or negatives, so that a
list of them can be converted into, or used to check, a Glossary.
It can be assumed that the authors of the pages are not trying to delude
this searcher?
How should it best be done, in outline?
--
(c) John Stockton, Surrey, UK. ¬@merlyn.demon.co.uk Turnpike v6.05 MIME.
Merlyn Web Site < > - FAQish topics, acronyms, & links.