[Info-vax] Removing blank lines from text files

Stephen Hoffman seaohveh at hoffmanlabs.invalid
Sat Jul 13 00:30:07 EDT 2019


On 2019-06-15 01:55:49 +0000, Bob said:

> Is there some simple DCL command, or a few commands, that I can use to 
> remove blank lines from a file?
> 
> I can obviously write a small DCL script to read the file, and write 
> the non blank lines, but I'm wondering if there is an easier way.

In DCL? Not particularly.  It's a ~dozen lines for the typical DCL 
input-output loop (and which could be implemented as file-less as a 
stage within a pipe using sys$pipe and ilk), this given your interest 
in only expunging blank lines and not expunging lines of whitespace.

Common GNV-based or Freeware-based tools such as grep or awk or ilk, or 
a language with better text-processing capabilities such as Lua, Perl, 
Python, php, etc.  There are better choices than trying to parse text 
in DCL.  Or TPU or SCAN or lib$table_parse, if you're into that sort of 
thing.   The grep code necessary here is a one-liner, using a DCL 
pipeline or file-based stage and something akin to grep --invert-match 
'^$' output-file-or-omit-this-for-next-pipe-stage.txt will get you what 
you want.

OpenVMS and most of its native tools are unfortunately lacking in 
text-processing support, particularly around very common character 
encodings such as UTF-8, and regular expressions, and parsing.  And 
there are no OpenVMS RTLs related to UTF-8, nor HTML, nor 
JSON/YAML/XML.  That's all add-ons and/or open-source.

In general, for the web-scraping task at hand? Use a web-scraping tool 
or framework, and a language better suited for this sort of thing than 
is DCL.  Perl offers Web::Scraper, for instance.  There are other good 
choices.  Lua, Perl, Python, php, and other languages are available for 
free, and all have versions ported to OpenVMS...  Some of the available 
versions are current, and some are crufty.
https://metacpan.org/pod/Web::Scraper
https://github.com/okpanic/lua-spider
https://realpython.com/python-web-scraping-practical-introduction/

As for using a regular expression framework for parsing HTML as was 
suggested in another reply, that's not necessarily particularly 
reliable and often a particularly good idea.  HTML as routinely 
encountered tends to be rather less than regular.  Famously:
https://stackoverflow.com/a/1732454





-- 
Pure Personal Opinion | HoffmanLabs LLC 




More information about the Info-vax mailing list