[Info-vax] OpenVMS x86-64 and RDB and DB's in general on OpenVMS

Sun Jun 28 17:47:11 EDT 2015

On 6/28/2015 9:49 AM, Stephen Hoffman wrote:
> On 2015-06-28 11:31:10 +0000, IanD said:
>> I like Cassandra as a nosql DB, no single point of failure sort of
>> melds in with the OpenVMS cluster concept but it also by-passes the
>> need for a traditional OpenVMS style of cluster too
>
> Apache Hadoop is a framework for large-scale distributed applications —
> way past what an OpenVMS cluster can provide — and Apache Cassandra
> integrates with that.

I am sure that you know it but just to make sure everybody does:
* the NoSQL database build on Hadoop is HBase
* Cassandra is not build on Hadoop
* Cassandra is integrated with Hadoop (which I assume must mean that
   data can be stored in a Cassandra cluster and processed by a
   Hadoop cluster)

> OpenVMS clustering is — as I've been commenting for a while — not
> necessary for many large-scale environments, and clustering doesn't
> provide particularly large-scale configurations by present-day standards.

Yes.

A few snippets.

Cassandra

http://cassandra.apache.org/

<quote>
One of the largest production deployments is Apple's, with over 75,000 
nodes storing over 10 PB of data. Other large Cassandra installations 
include Netflix (2,500 nodes, 420 TB, over 1 trillion requests per day), 
Chinese search engine Easou (270 nodes, 300 TB, over 800 million reqests 
per day), and eBay (over 100 nodes, 250 TB).
</quote>

HBase

http://radar.oreilly.com/2014/04/5-fun-facts-about-hbase-that-you-didnt-know.html

<quote>
Flurry has the largest contiguous HBase cluster: Mobile analytics 
company Flurry has an HBase cluster with 1,200 nodes (replicating into 
another 1,200 node cluster). Flurry is planning to significantly expand 
their large HBase cluster in the near future.
</quote>

Hadoop

https://wiki.apache.org/hadoop/PoweredBy

<quote>
Facebook

We use Apache Hadoop to store copies of internal log and dimension data 
sources and use it as a source for reporting/analytics and machine learning.

Currently we have 2 major clusters:

A 1100-machine cluster with 8800 cores and about 12 PB raw storage.

A 300-machine cluster with 2400 cores and about 3 PB raw storage.

Each (commodity) node has 8 cores and 12 TB of storage. </quote>

<quote>
Yahoo!

More than 100,000 CPUs in >40,000 computers running Hadoop

Our biggest cluster: 4500 nodes (2*4cpu boxes w 4*1TB disk & 16GB RAM)
</quote>

Arne