Re: [Hampshire] [OT] Idiots guide to Ethernet Switch failove…

Top Page

Reply to this message
Author: James Courtier-Dutton
Date:  
To: Hampshire LUG Discussion List
Subject: Re: [Hampshire] [OT] Idiots guide to Ethernet Switch failover needed.
On 20 December 2011 15:03, Paul Stimpson <paul@???> wrote:
>
>
> James Courtier-Dutton <james.dutton@???> wrote:
>
>>Hi,
>>
>>I have been given the task of explaining Layer 3 Ethernet switch
>>failover to a person who knows nothing about Ethernet or TCP/IP.
>
>>I made a stab at explaining HSRP and VRRP and why avoiding the "alone,
>>alone" scenario is a good idea etc., but lets just say, knowledge was
>>not transferred!
>>The problem is, if I fail to get the concept across correctly, actual
>>lives might be lost!!!  (It is going to be used in a safety related
>>communications system)
>>It is the job of the other person to decide if equipment X is good
>>enough for the job or not from a safety perspective.
>>
>
> I wouldn't try to explain the technicalities. From my perspective, it doesn't really matter how the piece of equipment does what it does or even exactly what it does. I would explain that this piece of equipment is responsible for handling all communications within the system. I would say that it, like every piece of equipment, will eventually fail and that, when it does, if there is no spare to automatically take over, the whole system will stop working until someone attends with a replacement. I'd then use phrases like "legal exposure" if someone dies in the however many hours it will take to get a spare and an engineer out when when it fails at 2am on Christmas morning. I'd also point out that this risk to human life needs to be listed in the project risk assessment, mumble"duty of care" and if it's not mitigated and someone dies then the HSE could well prosecute.
>
> Then just tell them that for £x you can fit a second unit and set it up to automatically take over so all this goes away. You can also say that it will save money in middle of the night engineer visits. I would follow your advice in writing and, if they decline, insist they put it in an email so you are covered.
>


I wish I could do what you just said.
But some understanding of the different ways it can fail has to
happen, because if a message fails to reach its destination after X
seconds of retries ( X being less than 10), people get injured or die.
So, if a failover takes 20 seconds, or there is a failure type that
would cause >10 seconds outage, we need to choose different
kit/configuration that fails over quicker!!!

Assume we have as many hot standby spares as we need.

So, a failure of many hours would be a catastrophy!!! I.e. A P1.
According to the requirements, the probability of P1 has to be very
low, and actions as a result would most probably involve a call to the
US President!

I know, they are trying to use Ethernet for something it was never
designed for, but hey, let them try, so long as it never gets to
"operational" stage.
The item stopping it from going "operational" would be "Sorry, we
can't do better than 20 seconds"

Kind Regards

James

--
Please post to: Hampshire@???
Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL: http://www.hantslug.org.uk
--------------------------------------------------------------