[Info-vax] Looking for some text search ideas
John Reagan
xyzzy1959 at gmail.com
Sat Sep 27 08:48:13 EDT 2014
On Saturday, September 27, 2014 8:01:30 AM UTC-4, Paul Sture wrote:
> On 2014-09-27, VAXman- @SendSpamHere.ORG <VAXman- at SendSpamHere.ORG> wrote:
> > "Regular expressions" are incomprehensible Geekery expressed as gibberish
> > to denote WHAT to search for but it does NOT specify the mechanics of HOW
> > to search for it!
>
> True, but Craig specifically mentioned the word "engine" and then Jan-Erik
> mentioned "libraries". It's always a compromise though. If there aren't
> such libraries available for the flavour of Basic David is using then he's
> going to get into the joys of the system management side of Perl or Python
> or... (shouldn't be much, but it's yet another cost in man hours).
>
There are two basic types of regex engines. "Regex-Directed" and "Text-Directed". The difference is how the current possible matches are tracked. Consider the expression:
to(nite|knight|night)
matching against "tonight". When it gets to the "n", the "regex-directed" engine (also called NFA) first checks if "nite" matches or not whereas the "text-directed engine (also called DFA) notes that both "nite" and "night" as still potential engines.
As you expect in the Unix world, some tools use one and some tools use the other. In practice for small patterns on reasonable sized data, the difference is small, but for large patterns or large data, you sometimes want to write your regex expression to match the engine you have.
Most of the things we've been talking about use the NFA engine but awk uses the DFA and there is actually a third form as defined by POSIX.
I was doing lots of Perl last year and wanted to learn about regular expressions so I bought the O'Reilly book on the topic. I found it very helpful. Will I ever need to write a huge regex expression? Nope. Even my complicated ones aren't really that complicated...
On VMS, I would now feel limited by what SEARCH offers (even after Guy tried to enhance it).
As for readability, yes, they can get really ugly and hard to understand (hence me buying the book), but I've noted that some of the complainers are also TECO fans so you really have no technical highground on which to stand. :)
More information about the Info-vax
mailing list