Re: kernelizing the network resolver

V Guruprasad <vgprasad@IEEE.ORG> · Mon, 4 Nov 2002 00:27:22 -0500

Dear John,

On Fri 2002.11.01, John Stracke wrote:

> V Guruprasad wrote:
>
> >- eliminates sockaddr_t handling in the user space, allowing
> > application code to become free of IPv4/IPv6 (or for that matter
> > raw Ethernet or ATM) dependencies;
> >
> Doesn't using a shared library for the resolver give you the same
> benefit? It's in user space, but it's not in the app.

Yes, sockaddr_t can be eliminated using a shared library, but in order
to do that, we must replace the following steps:

	server side:
		1  hostent_t haddr = gethostbyname (char* namepath);
		2  sockaddr_t sockaddr := { haddr, int port };
		3  fd_t sock = socket (domain, type, protocol);
		4  listen (sock);	 /* optional */
		5  bind (sock, sockaddr, sizeof (sockaddr));

	client side:
	[1,2,3] and
		7  int addrlen = sizeof (sockaddr);
		8  fd_t sock = connect (sock, sockaddr, &addrlen);

with one function call
	9  fd_t sock = sockaddressless_socket_open (
				char* namepath,
				int type,
				int protocol,
				int port
				);
This is only one step removed from the INFS version
	10  fd_t sock = open (char* namepath, int flags, [int omode]);
if the port, type and protocol parameters are included in the namepath as
	e.g. /com/ibm/research/www/rtsptoolkit/tcp:80.

The real motivation for going this extra step was the thesis that
emerged in the incomplete sigcomm02 submission, for which we (my
immediate ex-colleagues and I) had two important ingredients begging
for a top-down architectural story:

A) an alternative namespace-based "addressless networking" algorithm
   presented at ECUMN'00, together with a prototype implementation
   (as MS project) demo'd at INET'01; and

B) an address/route auto-aggregation algorithm comprising a Huffman-like
   recursive address assignment scheme applied to Dijkstra routing trees.

(A) says that a distributed network namespace is sufficient as a
primary addressing mechanism (the proof involves a basic property of
an unshared distributed tree as a self-defining network address space),
i.e. not depending on a manually managed/coordinated numeric address
space like IP. Thus, (A) facilitates (B) and (B) motivates (A).

However, this also means that the end-to-end-ness or long term stationarity
of the numeric (IP) addresses should no longer be taken for granted, and
that names should be used instead as the primary reference. This makes
it imperative to consider an alternative networking API that uses names
as addresses. Since the ordinary notion of what constitutes "system"
and what "application/user" concerns the operating system boundary, a
system calls interface of this form merited consideration. The fact that
the open(2) already has almost the desired form, and caters to a hierarchical
(filesystem) namespaces as well, made the INFS approach all the more
interesting to try out.

Yes, a shared library is still the way to go on many platforms, especially
on Windows where the socket implementation itself comes from dll's. The
filesystem interface is more restrictive, however, and provides a stronger
test of the sufficiency of the filesystem/file operations paradigm.

> >- reduces the number of context switches going from application
> > to resolver and back;
> >
> Do you have data showing these context switches are a problem? To me, it
> seems like you're optimizing something that doesn't take up that much
> time anyway--what apps spend that much CPU time on DNS lookups?

The context switching reduction was intended only to point out
that performance is likely to improve rather than worsen. However,
yes, it is one of the things on the to-do list, but I don't know
how soon I can get around to it given my current resources (being
out of job!).

> >- reduces the overall code footprint - the filesystem name tree
> > cache is reused, sockaddr_t handling code in applications gone.
> >
> Again, shared libs also reduce duplicate code (though not data; for that
> you do need the kernel, or a daemon).

The code reduction is *slightly* more than with just shared library:
with an slib, duplications between apps is avoided, but there is at
least one slib implementation of parsing and name caching code. With
the infs approach, even this much of the slib would be eliminated
as the vfs already contains similar code and would be reused.

I wholeheartedly agree that this much of code reduction is not all
that big a diff today, as memory and cpu cycles are quite cheap and
becoming even cheaper by the minute, but if a reduction is possible,
it's always educational to try it out. However, the sockaddr_t and
VFS integration were the main motivations.

thanks,
-prasad.