nanog mailing list archives

Re: dns and software, was Re: Reliable Cloud host ?


From: Owen DeLong <owen () delong com>
Date: Fri, 2 Mar 2012 12:59:46 -0800


On Mar 2, 2012, at 10:12 AM, William Herrin wrote:

On Fri, Mar 2, 2012 at 1:03 AM, Owen DeLong <owen () delong com> wrote:
On Mar 1, 2012, at 9:34 PM, William Herrin wrote:
You know, when I wrote 'socket=connect("www.google.com",80,TCP);' I
stopped and thought to myself, "I wonder if I should change that to
'connectbyname' instead just to make it clear that I'm not replacing
the existing connect() call?" But then I thought, "No, there's a
thousand ways someone determined to misunderstand what I'm saying will
find to misunderstand it. To someone who wants to understand my point,
this is crystal clear."

"Hyperbole." If I had remembered the word, I could have skipped the
long description.

I'm all for additional library functionality
I just don't want conect() to stop working the way it does or for getaddrinfo() to stop
working the way it does.

Good. Let's move on.


First question: who actually maintains the standard for the C sockets
API these days? Is it a POSIX standard?


Well, some of it seems to be documented in RFCs, but, I think what you're wanting doesn't require adds to the sockets 
library, per se. In fact, I think wanting to make it part of that is a mistake. As I said, this should be a
higher level library.

For example, in Perl, you have Socket (and Socket6), but, you also have several other abstraction libraries such as 
Net::HTTP.

While there's no hierarchical naming scheme for the functions in libc, if you look at the source for any of the open 
source libc libraries out there, you'll find definite hierarchy.

POSIX certainly controls one standard. The GNU libc maintainers control the standard for the libc that accompanies GCC 
to the best of my knowledge. I would suggest that is probably the best place to start since I think anything that gains 
acceptance there will probably filter to the others fairly quickly.

Next, we have a set of APIs which, with sufficient caution and skill
(which is rarely the case) it's possible to string together a
reasonable process which starts with a some kind of name in a text
string and ends with established communication with a remote server
for any sort of name and any sort of protocol. These APIs are complete
but we repeatedly see certain kinds of error committed while using
them.


Right... Since these are user-errors (at the developer level) I wouldn't try to fix them in the APIs. I would, instead, 
build more developer proof add-on APIs on top of them.

Is there a common set of activities an application programmer intends
to perform 9 times out of 10 when using getaddrinfo+connect? I think
there is, and it has the following functionality:

Create a [stream].to one of the hosts satisfying [name] + [service]
within [timeout] and return a [socket].


Seems reasonable, but ignores UDP. If we're going to do this, I think we should target a more complete solution to 
include a broader range of probabilities than just the most common TCP connect scenario.

Does anybody disagree? Here's my reasoning:

Better than 9 times out of 10 a steam and usually a TCP stream at
that. Connect also designates a receiver for a connectionless protocol
like UDP, but its use for that has always been a little peculiar since
the protocol doesn't actually connect. And indeed, sendto() can
designate a different receiver for each packet sent through the
socket.


Most applications using UDP that I have seen use sendto()/recvfrom() et. al. Netflow data would suggest that it's less 
than 9 out of ten times for TCP, but, yes, I would agree it is the most common scenario.

Name + Service. If TCP, a hostname and a port.

That would apply to UDP as well. Just the semantics of what you do once you have the filehandle are different. (and 
it's not really a stream, per se).

Sometimes you want to start multiple connection attempts in parallel
or have some not-quire-threaded process implement its own scheduler
for dealing with multiple connections at once, but that's the
exception. Usually the only reason for dealing with the connect() in
non-blocking mode is that you want to implement sensible error recover
with timeouts.


Agreed.

And the timeout - the direction that control should be returned to the
caller no later than X. If it would take more than X to complete, then
fail instead.


Actually, this is one thing I would like to see added to connect() and that could be done without breaking the existing 
API.



Next item: how would this work under the hood?

Well, you have two tasks: find a list of candidate endpoints from the
name, and establish a connection to one of them.

Find the candidates: ask all available name services in parallel
(hosts, NIS, DNS, etc). Finished when:

1. All services have responded negative (failure)

2. You have a positive answer and all services which have not yet
answered are at a lower priority (e.g. hosts answers, so you don't
need to wait for NIS and DNS).

3. You have a positive answer from at least one name service and 1/2
of the requested time out has expired.

4. The full time out has expired (failure).


I think the existing getaddrinfo() does this pretty well already.

I will note that the services you listed only apply to resolving the host name. Don't forget that you might also need 
to resolve the service to a port number. (An application should be looking up HTTP, not assuming it is 80, for example).

Conveniently, getaddrinfo simultaneously handles both of these lookups.

Cache the knowledge somewhere along with TTLs (locally defined if the
name service doesn't explicitly provide a TTL). This may well be the
first of a series of connection requests for the same host. If cached
and TTL valid knowledge was known for this name for a particular
service, don't ask that service again.


I recommend against doing this above the level of getaddrinfo(). Just call getaddrinfo() again each time you need 
something. If it has cached data, it will return quickly and is cheap. If it doesn't return quickly, it will still work 
just as quickly as anything else most likely.

If getaddrinfo() on a particular system is not well behaved, we should seek to fix that implementation of 
getaddrinfo(), not write yet another replacement.

Also need to let the app tell us to deprioritize a particular result
later on. Why? Let's say I get an HTTP connection to a host but then
that connection times out. If the app is managing the address list, it
can try again to another address for the same name. We're now hiding
that detail from the app, so we need a callback for the app to tell
us, "when I try again, avoid giving me this answer because it didn't
turn out to work."


I would suggest that instead of making this opaque and then complicating
it with these hints when we return, that we return use a mecahism where we
return a pointer to a dynamically allocated result (similar to getaddrinfo) and
if we get called again with a pointer to that structure, we know to delete the
previously connected host from the list we try next time.

When the application is done with the struct, it should free it by calling an
appropriate free function exported by this new API.


So, now we have a list of addresses with valid TTLs as of the start of
our connection attempt. Next step: start the connection attempt.

Pick the "first" address (chosen by whatever the ordering rules are)
and send the connection request packet and let the OS do its normal
retry schedule. Wait one second (system or sysctl configurable) or
until the previous connection request was either accepted or rejected,
whichever is shorter. If not connected yet, background it, pick the
next address and send a connection request. Repeat until a one
connection request has been issued to all possible destination
addresses for the name.

Finished when:

1. Any of the pending connection requests completes (others are aborted).

2. The time out is reached (all pending request aborted).

Once a connection is established, this should be cached alongside the
address and its TTL so that next time around that address can be tried
first.


Seems mostly reasonable. I would consider possibly having some form of inverse exponential backoff on the initial 
connection attempts. Maybe wait 5 seconds for the first one before trying the second one and waiting 2 seconds, then 1 
second if the third one hasn't connected, then bottoming out somewhere around 500ms for the remainder.



Since you were hell bent on calling the existing mechanisms broken rather than
conceding the point that the current process is not broken, but, could stand some
improvements in the library

I hold that if an architecture encourages a certain implementation
mistake largely to the exclusion of correct implementations then that
architecture is in some way broken. That error may be in a particular

I don't believe that the architecture encourages the implementation mistake.

Rather, I think human behavior and our tendency not to seek proper understanding of the theory of operation of various 
things prior to implementing things which depend on them is more at fault. I suppose that you can argue that the API 
should be built to avoid that, but, we'll have to agree to disagree on that point. I think that low-level APIs (and 
this is a low-level API) have to be able to rely on the engineers that use them making the effort to understand the 
theory of operation. I believe that the fault here is the lack of a standardized higher-level API in some languages.

component, but it could be that the components themselves are correct.
There could be in a missing component or the components could strung
together in a way that doesn't work right. Regardless of the exact
cause, there is an architecture level mistake which is the root cause
of the consistently broken implementations.


I suppose by your definition this constitutes a missing component. I don't see it that way. I see it as a complete and 
functional system for a low-level API. There are high-level APIs available. As you have noted, some better than others. 
A standardized well-written high-level API would, indeed, be useful. However, that does not make the low-level API 
broken just because it is common for poorly trained users to make improper use of it. It is common for people using 
hammers to hit their thumbs. This does not mean that hammers are architecturally broken or that they should be 
re-engineered to have elaborate thumb-protection mechanisms.

The fact that you can electrocute yourself by sticking a fork into a toaster while it is operating is likewise, not an 
indication that toasters are architecturally broken.

It is precisely this attitude that has significantly increased the overhead and unnecessary expense of many systems 
while making product liability lawyers quite wealthy.

Owen



Current thread: