nanog mailing list archives

Re: Amazon diagnosis

From: Andrew Kirch <trelane () trelane net>
Date: Sun, 01 May 2011 14:18:47 -0400

On 5/1/2011 2:07 PM, Mike wrote:

I am still waiting for proof that single points of failure can
realistically be completely eliminated from any moderately complicated
network environment / application. So far, I think murphy is still
winning on this one.


Sure they can, but as a thought exercise fully 2n redundancy is
difficult on a small scale for anything web facing.  I've seen a very
simple implementation for a website requiring 5 9's that consumed over
$50k in equipment, and this wasn't even geographically diverse.  I have
to believe that scaling up the concept of "doing it right" results in
exponential cost increases.  To illustrate the problem, I would give you
the first step in the thought exercise:  first find two datacenters with
diverse carriers, that aren't on the same regional power grid (As we've
learned in the (iirc) 2003 power outage, New York and DC won't work, nor
will Ohio, so you need redundant teams to cover a very remote site).

Current thread:

Re: Amazon diagnosis Mike (May 01)
- Re: Amazon diagnosis Jay Ashworth (May 01)
- Re: Amazon diagnosis Andrew Kirch (May 01)
  - Re: Amazon diagnosis Jeff Wheeler (May 01)
    - Re: Amazon diagnosis Paul Graydon (May 01)
    - Re: Amazon diagnosis Jeroen van Aart (May 02)
    - Re: Amazon diagnosis Valdis . Kletnieks (May 02)
    - Re: Amazon diagnosis Jeroen van Aart (May 02)
    - Re: Amazon diagnosis George Herbert (May 02)
    - Re: Amazon diagnosis Jason Baugher (May 03)
    - Re: Amazon diagnosis Phil Pierotti (May 03)
    - Re: Amazon diagnosis Paul Graydon (May 02)
    - Re: Amazon diagnosis Ryan Malayter (May 05)

(Thread continues...)