[Info-vax] Databases versus RMS

Thu Apr 19 12:29:55 EDT 2012

Paul Sture wrote 2012-04-19 16:42:
> On Wed, 18 Apr 2012 08:32:55 -0500, Bob Koehler wrote:
>
>> In article<4f8dbbf4$0$24585$c3e8da3$e408f015 at news.astraweb.com>, JF
>> Mezei<jfmezei.spamnot at vaxination.ca>  writes:
>>>
>>> Are database engines so fragile that a power failure will truly wreak
>>> havok on a database requiring time consuming work and debugging ?
>>
>>      From time to time, things that should work, like databases, or
>>     filesystems, don't.  Generally vendors will fix things real quick. I
>>     once spec'd IBM AIX in part because I assumed that IBM would provide
>>     a bug free filesystem.  Boy was I suprized when I had to manually run
>>     fsck after every power outage.  I was not suprized when the next
>>     minor version release fixed it.
>
> fsck on Linux will also run every n restarts.  On my first Linux system a
> dozen years ago, n was something like 6.  Since it was a dual boot system
> I could easily get there several times a week.  It took so long with the
> hardware I was using that it was time to put the kettle on. And possibly
> make a sandwich too.
>
>>     "database" could also be a collection of data, without an actual
>>     DBMS.
>>
>>     But it's easy to buy a DBMS that does work, and not uses it
>>     correctly.
>
> I had a clear demonstration of that with some trial personal accounting
> software for Macs, which was so slow as to be unusable.  I know I wasn't
> running it on the latest and greatest hardware, but that same machine was
> quite capable of running industrial strength commercial accounting
> packages.
>
> It turned out that this software was using PostgreSQL, and provided dumps
> in SQL format (plus marks for that).  OTOH inspection of the SQL revealed
> countless indexes which only ever contained one of 2 or 3 values.
>
> Sorry, but I learn that indexes with only 2 values such as "M" and "F"
> were bad news back in RMS days.  I'm pretty sure the RMS documentation
> included that as an example of things not to do.

I sure RMS doesnt handle that in a good way.

Rdb handles this quite nicely today using "sorted ranked" indexes.

SQL> select count(*) from <table>;

      2957357
1 row selected

SQL> show index <table>_INDEX_4;
Indexes on table <table>:
<table>_INDEX_4                  with column <f1>
                                 and column <f2>
   Duplicates are allowed
   Type is Sorted Ranked
     Duplicates are Compressed Bitmaps
   Key suffix compression is DISABLED
   Node size  430

SQL>

Both <f1> and <f2> are CHAR(1) fields.

SQL> set trans read only;
SQL>  select <f1>, <f2>, count(*) from <table> group by <f1>, <f2>
  <f1>  <f2>
        0          9928
        1             5
  0     0        561157
  0     1           101
  A     0           192  <<==
  A     1       2329898
  P     0         55871
  P     1           132
  S     0            73
9 rows selected
SQL>

Each 15 min there is a job looking for rows with the "A"/"0" value pair.
Runs in below 2 sec (666 MHz DS20e) with this index.

And the total size of the index is (since it uses "compressed bitmap")
small, less then *one byte* total per indexed row in the table.
Aprox 2.4 MB index space for 2.9 million indexed rows.

Yes, probably a bad design to start with, but that happend
back inthe 90's... :-)

Jan-Erik.