[Info-vax] The Future of Server Hardware?
JohnF
john at please.see.sig.for.email.com
Wed Oct 3 22:23:06 EDT 2012
Stephen Hoffman <seaohveh at hoffmanlabs.invalid> wrote:
> JohnF said:
>> Stephen Hoffman <seaohveh at hoffmanlabs.invalid> wrote:
>>>
>>> If you're looking to render images or brute-force passwords, definitely
>>> yes. (This is why I was pointing to bcrypt and scrypt a while back,
>>> too. But I digress.)
>>
>> Thanks for the info. So vectorizable, "definitely yes";
>> parallelizable, "not so much."
And thanks for the additional info...
> GPUs can run various tasks in parallel, as can the cores available
> within most modern boxes. Laptops are now arriving with two and four
> codes, and a workstation can have 8 to 16 cores plus a couple of fast
> GPUs.
>
> How much data do you need to spread around? And how often. How
> closely are the work-bits tied together or dependent. (qv: Amdahl's
> Law.)
>
> Adding boxes into an application design - whether x86-64 or ARM or
> mainframe or rx2600 - means that box-to-box latency and bandwidth are
> in play, too.
>
> The same bandwidth and latency issues arise within a multiprocessor
> box, but those links are usually shorter and faster. (There are also
> some degenerate hardware cases around, where loading and unloading a
> particularly GPU is relatively performance-prohibitive, for instance;
> where the GPU is gonzo fast, but getting the data in and out just
> isn't.)
>
> If you have a good way to distribute that data or schedule the tasks,
> or if you have fast links, then adding boxes scales nicely.
>
> If not, then you're headed for more specialized hardware (mainframe,
> etc), or toward a world of hurt.
>
> For an example, VMS used to run into a wall around 8 CPUs or so; where
> adding processors didn't help or could even reduce aggregate
> performance. That's gotten (much) better in recent VMS releases (due
> in no small part to the efforts of some of the VMS engineers to break
> up the big locks), but VMS still doesn't scale as high as some other
> available platforms. Clustering has some similar trade-offs here, too;
> where the overhead of sharing a disk across an increasing number of
> hosts runs into the proverbial wall, for instance.
>
> One size does not fit all.
>> Is there a pretty standard whitebox configuration people typically
>> put together as a gpu platform -- MB make and model, power supply
>> and cooling options, memory, etc? And is nvidia/cuda the current
>> favorite among the gpu/"language" options to put in that box?
>
> There are probably almost as many options as there are opinions.
There seem to be several out-of-the-box vendor-assembled solutions,
e.g., http://www.nvidia.com/object/personal-supercomputing.html
as well as build-your-own recommendations, e.g.,
http://www.nvidia.com/object/tesla_build_your_own.html
(and plenty of non-nvidia pages about all that, too).
But my personal experience assembling boxes from components
doesn't xlate comfortably to that regime, i.e., still hard
to choose wisely.
> I'd tend toward OpenCL, given the choice. But then that's native on
> the platforms that I most commonly deal with.
Thanks for that, too. I'll take a look. In particular, I'm trying
to spec out/prototype migrating a binomial tree solution for
Black-Scholes over to this architecture. The current system works
fine pricing along a monthly tree for 30 years (360 nodes),
but a daily tree (needed to better model rational exercise of
options) is beyond it. A few desktop Tflops might do it, and
justifying the $10K-or-so hardware needed to test that should be
easy. But I don't yet have a shovel-ready proposal.
Programming time is another story. Choosing the best software
library platform to encapsulate the gpu details and expose only
the math is the harder part (for me).
> But then this is not the best group for x86-64 HPTC questions, either.
--
John Forkosh ( mailto: j at f.com where j=john and f=forkosh )
More information about the Info-vax
mailing list