[Info-vax] OpenVMS servers and clusters as a cloud service
IanD
iloveopenvms at gmail.com
Sun Jan 14 15:21:55 EST 2018
Den 2018-01-08 kl. 09:25, skrev IanD:
This is just silly...
> What good is it if I run a batch job for 36 hours on VMS if the node
> crashes and I have to start again. I've wasted 36 hours!
> Than you have an very badly desiged batch job. Either split it up into
multiple jobs or make sure in some other way that the job is restartable.
Such a job should of course be able to pick up where it crashed.
> Anyway, I think that is a very big "IF" there. Show me a real life
example of one such job instead...
Have you worked with large data sets when crunching things like genomic data? The data sets start in the TB range and go up from there
Have a look at the machine learning competitions on the likes of kaggle and some of the cut down data sets and you'll get an idea of some of the large data volumes in use out there
Why do you think The likes of Hadoop exists?
It's to handle such volumes without the need to specifically code a dedicated workload job and handle all the complexity of machine failure
It's not just about job restartability. It's about parallel processing with redundancy natively without the need to specifically code it to the individual workload
Hadoop has within it's abilities to do more than what you could do without writing a shit load of handler code in VMS to do distributed workload management
Try writing a distributed data analyser to crunch through 5 PB of data and collate it all and to cope with machine outages/failures of an unknown type along the way on VMS and see how long it takes
Try encoding 100,000 videos to transform then from one format to another and keep track of the conversations in real time and again handle any failure along the way and see how much wrapper code it will require on the VMS platform. Google does this type of task quite a lot. So does Netflix etc. Training providers also do mass video conversation
None of these can be done within reasonable timeframes on any VMS cluster natively while taking into account any type of failure along the way
You'd effectively be trying to rewrite major chunks of what Hadoop already does and has developed over many years now
Hadoop is the modern cluster as far as data crunching is concerned. It handles natively and in real time machine failure without operator intervention. Job failure handling is built into Hadoop
Watch the linked video for at least the first few mins and then tell me how long it would take to develop the equivalent on a VMS cluster to handle everything Hadoop can already do today and to scale up.
The video makes reference to Hadoop scaling in near linear fashion up to 4500 nodes. I think that limit has been raised since
https://youtu.be/OoEpfb6yga8
You will of course have noted that Hadoop also automatically handles data redistribution when a worker thread dies or failures to respond in a given timeframe
The only thing 'silly' imo is trying to say VMS clusters can handle such large batch compute jobs natively.
VMS clusters have been superceded in this respect by orders of magnitude.
The cluster world has moved on in the last 10-15 years that VMS had been asleep
More information about the Info-vax
mailing list