[Info-vax] Distant Cluster?

Tue Oct 9 03:04:48 EDT 2012

In article <k4vtmd$9db$1 at dont-email.me>, David Froble <davef at tsoft-inc.com> writes:
>Michael Moroney wrote:
>> gartmann at nonsense.immunbio.mpg.de (Christoph Gartmann) writes:
>> 
>>> In article <b9dcb6a6-8b38-4729-9b1e-abac00f3bba1 at googlegroups.com>, Ken Fairfield <ken.fairfield at gmail.com> writes:
>> 
>>>> There are no inherent problems with a long RECNXINTERVAL other
>>>> than (some vague memories I have of) lengthened cluster transistion
>>>> times.
>> 
>>> Good to know.
>> 
>> The question you have to ask yourself is whether you or your users can
>> tolerate random "hangs" by the entire cluster for up to RECNXINTERVAL
>> seconds, pretty much any time there is a network glitch such as rebooting
>> a switch.  Because that is what wil happen until things resolve themselves
>> or some node(s) get kicked out of the cluster.
>> 
>> Default RECNXINTERVAL is 20 seconds.
>
>That is a timeout value, and only comes into play when the link is down. 
>  If the cluster is broken, aren't the users hosed anyway?  I'd rather 
>they take a short break and come back to where they left off.

The problem wouldn't be the two minute reboot time for the switch but the time
the rx6600s take to write a dump and reboot (about 10 minutes).

>If it happens often, then perhaps the core problem should be addressed.

It doesn't happen "often" (switch firmware upgrade) but it happens.

>There is "doable" and then there is "prudent".  I'd think a private 
>direct link would be prudent.

That's the way to go but it will take some time. In the meantime RECNXINTERVAL
is an option.

Regards,
   Christoph Gartmann

-- 
 Max-Planck-Institut fuer      Phone   : +49-761-5108-464   Fax: -80464
 Immunbiologie und Epigenetik
 Postfach 1169                 Internet: gartmann at immunbio dot mpg dot de
 D-79011  Freiburg, Germany
               http://www.immunbio.mpg.de/home/menue.html