Home page logo

nanog logo nanog mailing list archives

Re: dns and software, was Re: Reliable Cloud host ?
From: Owen DeLong <owen () delong com>
Date: Wed, 29 Feb 2012 10:01:12 -0800

On Feb 29, 2012, at 6:18 AM, William Herrin wrote:

On Wed, Feb 29, 2012 at 7:57 AM, Joe Greco <jgreco () ns sol net> wrote:
In message <CAP-guGXK3WQGPLpmnVsnM0xnnU8==4zONK=UWTLkYWuduA6T9Q () mail gmail com>,
 William Herrin writes:
On Tue, Feb 28, 2012 at 4:06 PM, Mark Andrews <marka () isc org> wrote:
DNS TTL works. =A0Applications that don't honour it arn't a indication th=
it doesn't work.


If three people died and the building burned down then the sprinkler
system didn't work. It may have sprayed water, but it didn't *work*.

Not enough evidence to say if it worked or not.  Sprinkler systems
are designed to handle particular classes of fire, not every fire.

It is also worth noting that many fire systems are not intended to
put out the fire, but to provide warning and then provide an extended
window for people to exit the affected building through use of sprinklers
and other measures to slow the spread of the fire.

Hi Joe,

The sprinkler system is designed to delay the fire long enough for
everyone to safely escape. As a secondary objective, it reduces the
fire damage that occurs while waiting for firefighters to arrive and
extinguish the fire. If "three people died" then the system failed.
Perhaps the design was inadequate. Perhaps some age-related issue
prevented the sprinkler heads from melting. Perhaps someone stacked
boxes to the ceiling and it blocked the water. Perhaps the water was
shut off and nobody knew it. Perhaps an initial explosion damaged the
sprinkler system so it could no longer work effectively. Whatever the
exact details, that sprinkler system failed.

Bill, you are blaming the sprinkler system for what could, in fact, be not
a failure of the sprinkler system, but, of the 3 humans.

If they were too intoxicated or stoned to react, for example, the sprinkler
system is not to blame. If they were overcome by smoke before the
sprinklers went off, that may be a failure of the smoke detectors, but, it
is not a failure of the sprinklers. If they were killed or rendered unconsious
and/or unresponsive in the preceding explosion you mentioned and did
not die in the subsequent fire, then, that is not a failure in the sprinkler

Whoever you want to blame, DNS TTL dysfunction at the application
level is the same way. It's a failed system. With the TTL on an A
record set to 60 seconds, you can't change the address attached to the
A record and expect that 60 seconds later no one will continue to
connect to the old address. Nor 600 seconds later nor 6000 seconds
later. The "system" for renumbering a service of which the TTL setting
is a part consistently fails to reliably function in that manner.

Yes, the assumption by developers that gni/ghi is a fire-and-forget
mechanism and that the data received is static is a failure. It is not a
failure of DNS TTL. It is a failure of the application developers that
code that way. Further analysis of the underlying causes of that failure
to properly understand name resolution technology and the environment
in which it operates is left as an exercise for the reader.

The fact that people playing interesting games with DNS TTLs don't
necessarily understand or well document the situation to raise awareness
among application developers could also be argued to be a failure
on the part of those people.

It is not, in either case, a failure of the technology.

One should always call gni/gai in close temporal (and ideally close
in the code as well) proximity to calling connect(). Obviously one
should call these resolver functions prior to calling connect().

Most example code is designed for short-lived non-recovering flows,
so, it's designed along the lines of resolve->(iterate through results
calling connect() for each result untill connect() succeeds)->process->

Examples for persistent connections and/or connections that recover
or re-establish after a failure and/or browsers that stay running for a
long time and connect to the same system again significantly later
are few and far between. As a result, most code doing that ends up
being poorly written.

Further, DNS performance issues in the past have led developers of
such applications to "take matters into their own hands" to try and
improve the performance/behavior of their application in spite of
DNS. This is one of the things that led to many of the TTL ignorant
application-level DNS caches which you are complaining about.

Again, not a failure of DNS technology, but, of the operators of that
technology and the developers that tried to compensate for those
failures. They introduced a cure that is often worse than the disease.


Bill Herrin

William D. Herrin ................ herrin () dirtside com  bill () herrin us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004

  By Date           By Thread  

Current thread:
[ Nmap | Sec Tools | Mailing Lists | Site News | About/Contact | Advertising | Privacy ]