[Info-vax] Where to locate software

Fri Jun 10 23:01:56 EDT 2016

> -----Original Message-----
> From: Info-vax [mailto:info-vax-bounces at info-vax.com] On Behalf Of
> lawrencedo99--- via Info-vax
> Sent: 10-Jun-16 10:14 PM
> To: info-vax at info-vax.com
> Cc: lawrencedo99 at gmail.com
> Subject: Re: [New Info-vax] Where to locate software
> 
> On Saturday, June 11, 2016 at 2:05:05 PM UTC+12, Kerry Main wrote:
> > However, I would argue shared nothing architectures (Windows, Linux,
> > UNIX) in distributed db's require much more up front planning because
> > how you split up your Apps servers and especially data is critical. If hot
> > spots occur due to unexpected loads in a few areas, then it becomes
> very
> > difficult to address because you either increase that specific server size
> > (and its designated backup) or re-partition the data or provide error
> > messages to the client - the proverbial "server busy - please try later".
> 
> Cluster filesystems, map-reduce, all that kind of thing. There was an
> article from a few years ago, from when Google only ran about 460,000
> physical servers, about how they manage it all.
> 
> Does your «insert name of favourite proprietary product here» scale to
> that level?

Nope - Google is unique in that it has unlimited budgets and an application
environment that on average is likely north of 90% reads. In addition, errors 
mean very little in real user impact e.g. if a search error returns an error, 
users simply retry. If it errors a second time, they will flip to Bing. 

Yes, marketers using data extracted from google mail will be upset, but 
Impact to end users - minimal to none. 

They also use rack servers only which means huge amounts of network
latency that is imho, going to become the biggest bottleneck in the
future for environments that require update persistence - as opposed to
read only transactions.

Because they have so many servers, I would also be willing to bet they 
have huge numbers of servers which are less than 20% busy at peak times.

The future is all about getting the data closer to the compute engine in
the least amount of time (latency).  While compute, storage and memory
has increased exponentially in recent years, network latency (not speed)
is emerging as the next big bottleneck in the overall solution.

Google's architecture works for Google, but I would NOT recommend it
for a next generation environment which requires much lower overall 
latency AND at much lower costs. 

Reference: (from 2011, but still applies today)
http://highscalability.com/blog/2011/8/29/the-three-ages-of-google-batch-warehouse-instant.html
" The problem is we aren't meeting this challenge. Our infrastructure is 
broken. Datacenters have the diameter of a microsecond, yet we are still 
using entire stacks designed for WANs. Real-time requires low and 
bounded latencies and our stacks can't provide low latency at scale. We 
need to fix this problem and towards this end Luiz sets out a research 
agenda, targeting problems that need to be solved:"

Hence, a fundamental conclusion is to eliminate network latency in as
many areas as one can.

Regards,

Kerry Main
Kerry dot main at starkgaming dot com