[Info-vax] OpenVMS servers and clusters as a cloud service

Johnny Billquist bqt at softjar.se
Mon Jan 1 11:56:00 EST 2018


On 2017-12-31 16:44, Kerry Main wrote:
>> -----Original Message-----
>> From: Info-vax [mailto:info-vax-bounces at rbnsn.com] On Behalf Of
>> Johnny Billquist via Info-vax
>> Sent: December 31, 2017 10:09 AM
>> To: info-vax at rbnsn.com
>> Cc: Johnny Billquist <bqt at softjar.se>
>> Subject: Re: [Info-vax] OpenVMS servers and clusters as a cloud service
>>
>> On 2017-12-31 15:46, Kerry Main wrote:
>>>> I wrote that Google (according to gossip) have one sysadm per 28000
>>>> servers.
>>>>
>>>
>>> That’s crap and/or are following the age old misdirection whereby they
>> do not count a whole lot of resources doing what people would consider
>> sysadmin work.
>>> - Is that one sysadmin managing IP addresses for 28,000 servers?
>>> - is that one sysadmin monitoring backups for 28,000 servers?
>>> - is that one sysadmin monitoring security logs for 28,000 servers?
>>> - is that one sysadmin working with firewall groups to address issues
>> with firewall issues associated with 28,000 servers?
>>>
>>> ROTFL
>>
>> Not sure what you are laughing about, or what the point is...
>>
>>>> I did not write that Google has 28000 servers.
>>>>
>>>> They have a lot more servers.
>>>>
>>>> Internet gossip says that they have 900000 servers.
>>>>
>>>
>>> Huge numbers I have no doubt (likely 90% are VM's).
>>
>> Actually, the number is more than 900,000 and we're talking physical
>> machines. The number of VMs are magnitudes higher.
>>
>>     Johnny
>>
> 
> 
> My point is that I know Amazon and Google etc. have very high numbers of servers (phys or VM really does not really matter from admin perspective), but their Sysadmin ratio is not 28,000 servers for a single admin. Hung services, hung/crashed servers, log monitoring, backup failures, password mgmt., hardware failures, firewall rules integration are all examples of Sysadmin activities where tools and custom automation can certainly help.
> 
> Large companies like this have large Engineering and Operations teams doing nothing but custom NOC / ECC monitoring, backup management, event log and security management and workflow automation of the IT infrastructure.
> 
> Do these teams count as part of their "server" SysAdmin ratio's?
> 
> There are basic things which have to happen for every server instance and yes, smart tools and custom automation absolutely do help, but there is a limit on what tools can do without some level of experience at the controls.  That is why large companies like Google have L1, L2 and L3 escalation support models for each level of their IT dept.
> 
> Do these escalation teams count as part of their "server" SysAdmin ratio's?
> 
> Make no mistake - there is lots to learn about scalability from the Googles and Amazons of the world, but we also need to keep open minds and not blindly accept whatever their marketing depts. pump out about how good they are.

Your guessing on L1, L2 and L3 escalation support, and how Google 
actually manages their systems is actually very far from the truth.
They do not have such an organization. In reality, you can hardly say 
they have any system administrators at all. It's all fully automated. 
Normally, there is no manual labor involved at all in any phase of the 
life of a server. The machines are physically mounted into racks, and 
registered, and from there on, it's all handled automatically, and 
software gets installed, scheduled, removed and migrated without any 
sysadm at all. When something goes wrong with the hardware, the basic 
diagnostics also happens automatically, and then you have hardware ops 
at the data center who goes to the physical machine to diagnose and 
replace broken parts, and mark the machine as fixed again, at which 
point the automated system picks the machine up again, and brings it 
back into the work pool.

It's even hard to point out who is a sysadm at Google, that type of role 
hardly exists. People are working on development, or keeping services up 
and running. Managing machines is not something almost anyone needs to 
care about.

Management of IP addresses, and all infrastructure around that is also 
pretty much all automated.
You cannot manage hundred of thousands of machines without pretty much 
fully automating all of it.

   Johnny

-- 
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt at softjar.se             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol



More information about the Info-vax mailing list