[Info-vax] OpenVMS servers and clusters as a cloud service

IanD iloveopenvms at gmail.com
Sun Jan 14 15:21:55 EST 2018


Den 2018-01-08 kl. 09:25, skrev IanD: 

This is just silly... 

> What good is it if I run a batch job for 36 hours on VMS if the node 
> crashes and I have to start again. I've wasted 36 hours! 

> Than you have an very badly desiged batch job. Either split it up into 
multiple jobs or make sure in some other way that the job is restartable. 
Such a job should of course be able to pick up where it crashed. 

> Anyway, I think that is a very big "IF" there. Show me a real life 
example of one such job instead... 


Have you worked with large data sets when crunching things like genomic data? The data sets start in the TB range and go up from there

Have a look at the machine learning competitions on the likes of kaggle and some of the cut down data sets and you'll get an idea of some of the large data volumes in use out there

Why do you think The likes of Hadoop exists?

It's to handle such volumes without the need to specifically code a dedicated workload job and handle all the complexity of machine failure

It's not just about job restartability. It's about parallel processing with redundancy natively without the need to specifically code it to the individual workload

Hadoop has within it's abilities to do more than what you could do without writing a shit load of handler code in VMS to do distributed workload management

Try writing a distributed data analyser to crunch through 5 PB of data and collate it all and to cope with machine outages/failures of an unknown type along the way on VMS and see how long it takes

Try encoding 100,000 videos to transform then from one format to another and keep track of the conversations in real time and again handle any failure along the way and see how much wrapper code it will require on the VMS platform. Google does this type of task quite a lot. So does Netflix etc. Training providers also do mass video conversation

None of these can be done within reasonable timeframes on any VMS cluster natively while taking into account any type of failure along the way

You'd effectively be trying to rewrite major chunks of what Hadoop already does and has developed over many years now

Hadoop is the modern cluster as far as data crunching is concerned. It handles natively and in real time machine failure without operator intervention. Job failure handling is built into Hadoop

Watch the linked video for at least the first few mins and then tell me how long it would take to develop the equivalent on a VMS cluster to handle everything Hadoop can already do today and to scale up.

The video makes reference to Hadoop scaling in near linear fashion up to 4500 nodes. I think that limit has been raised since

https://youtu.be/OoEpfb6yga8

You will of course have noted that Hadoop also automatically handles data redistribution when a worker thread dies or failures to respond in a given timeframe

The only thing 'silly' imo is trying to say VMS clusters can handle such large batch compute jobs natively. 

VMS clusters have been superceded in this respect by orders of magnitude. 
The cluster world has moved on in the last 10-15 years that VMS had been asleep



More information about the Info-vax mailing list