[Info-vax] Removing blank lines from text files
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Sat Jul 13 00:30:07 EDT 2019
On 2019-06-15 01:55:49 +0000, Bob said:
> Is there some simple DCL command, or a few commands, that I can use to
> remove blank lines from a file?
>
> I can obviously write a small DCL script to read the file, and write
> the non blank lines, but I'm wondering if there is an easier way.
In DCL? Not particularly. It's a ~dozen lines for the typical DCL
input-output loop (and which could be implemented as file-less as a
stage within a pipe using sys$pipe and ilk), this given your interest
in only expunging blank lines and not expunging lines of whitespace.
Common GNV-based or Freeware-based tools such as grep or awk or ilk, or
a language with better text-processing capabilities such as Lua, Perl,
Python, php, etc. There are better choices than trying to parse text
in DCL. Or TPU or SCAN or lib$table_parse, if you're into that sort of
thing. The grep code necessary here is a one-liner, using a DCL
pipeline or file-based stage and something akin to grep --invert-match
'^$' output-file-or-omit-this-for-next-pipe-stage.txt will get you what
you want.
OpenVMS and most of its native tools are unfortunately lacking in
text-processing support, particularly around very common character
encodings such as UTF-8, and regular expressions, and parsing. And
there are no OpenVMS RTLs related to UTF-8, nor HTML, nor
JSON/YAML/XML. That's all add-ons and/or open-source.
In general, for the web-scraping task at hand? Use a web-scraping tool
or framework, and a language better suited for this sort of thing than
is DCL. Perl offers Web::Scraper, for instance. There are other good
choices. Lua, Perl, Python, php, and other languages are available for
free, and all have versions ported to OpenVMS... Some of the available
versions are current, and some are crufty.
https://metacpan.org/pod/Web::Scraper
https://github.com/okpanic/lua-spider
https://realpython.com/python-web-scraping-practical-introduction/
As for using a regular expression framework for parsing HTML as was
suggested in another reply, that's not necessarily particularly
reliable and often a particularly good idea. HTML as routinely
encountered tends to be rather less than regular. Famously:
https://stackoverflow.com/a/1732454
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list