On Fri, Jun 12, 2009 at 12:41 AM, Sean Conner<spc@xxxxxxxxxx> wrote: > It was thus said that the Great Vinay Nagrik once stated: >> Thank you Andrew and Tom, >> >> Thank you for your insightful replies. These have definitely helped me in >> understanding the major issues. >> >> At this moment I can not understand "How a 'Connecton' is passed from parent >> process to child process." >> >> My understanding is >> >> "a connection is a combination of (IP address + port.) and the parent >> process listens at one such address or multiple such addresses in virtual >> host interfaces." >> >> Let us assume the parent process is listening to only one such address i.e. >> (IP address + port). Then if this connection is passed to the child then >> this connection must be blocked and this is the only connection which will >> be multiplexed among several child processes as welll parent process. My >> point is that concurrency can not be achieved on a single connection (IP >> address + port) unless I am missing something fundemental about the >> definition of "Connection." >> >> Secondly if a connection is passed to the child then once again the child >> process will have to make a three way handshake to the original client to >> service the request. >> >> I hope Andrew or someone from the group can clear my doubts. > > This is for Unix and Unix-like operating systems. Your milage will vary > with other operating systems. > > When Apache starts, it creates a listening TCP socket. In the kernel, > this socket will look something like: > > protocol localhost port remotehost remoteport > TCP 192.168.1.23 80 0.0.0.0 0 > > In other words, a half connected socket (in reality, the localhost portion > can also be 0.0.0.0, which means listen on all interfaces that support an IP > address, but for the sake of argument, let's say we only want Apache to > listen on a particular interface). The code to create this typically (if > not spread out) looks like: > > struct sockaddr addr; > int sock; > > /* > * this creates space for a TCP socket > */ > > sock = socket(AF_INET,SOCK_STREAM,0); > > /* > * fill in the address we want to to listen in on > * is not quite this way, but the actual details > * would only get in the way ... > */ > > addr.family = AF_INET; > addr.host = 192.168.1.23 > addr.port = 80; > > /* > * now, connect the address/port to the socket > * we just created > */ > > bind(sock,&addr,sizeof(addr)); > > So now we have our side of the socket created (see above). Now, onto real > work. Apache (and I'm assuming the pre-fork version here) will create the > children processes to handle actual requests. This is done via the fork() > call (which creates a duplicate of the calling process). As part of this > fork() call, the child process will see this socket as well [1], but since > it doesn't handle incoming connections, the child can then close its copy of > the socket (which won't affect the socket in the main parent process). The > child process then changes its effective user id to some lower priviledged > account, and then wait for the parent to give it some work to do. > > The parent process, however, continues on and tells Unix it is ready > to accept network connections. > > /* > * tell Unix we want to accept connections on this port. The > * 5 value is the size of the backlog---the number of incoming > * connections the kernel will queue up for us while we're busy > * doing other stuff ... more on this in a bit > */ > > listen(sock,5); > > So now Unix knows the main Apache process wants to accept connections on > TCP port 80. Then the main Apache process Apache enters a loop that looks > like: > > struct sockaddr remote_addr; > socklen_t remote_size; /* size of remote address */ > int connection; > > for ( ; ; ) /* ever */ > { > /* > * we'll accept connections from anywhere, and from any port > */ > > remote_addr.family = AF_INET; > remote_addr.host = ANY_IP; > remote_addr.port = ANY_PORT; > remote_size = sizeof(remote_addr); > connection = accept(sock,&remote,&remsize); > > /* > * between now and the time we get back to the accept() > * call, the Unix kernel will queue up to five connection > * requests. More on this below ... > * Meanwhile, pass this socket to a child process ... > */ > > pass_connection_to_child(connection); > > /* > * now that we have passed the socket on, the parent > * no longer needs its copy of the socket, so it can > * close it, and cycle back for another connection. > */ > > close(connection); > } > > The accept() call blocks Apache until an incoming connection to port 80 is > initiated (or one or more are pending). It then returns a new socket of > this connection (the remote address is stored in remote_addr, and the size > of this structure is also return in remote_size---the network stack under > Unix can work with more than just IP and different network protocols have > different size addresses; for instance, while an IP address:port is 6 bytes > (four for address, two for port), an IPv6 address:port will be 18 bytes). > So, now we have: > > var protocol localhost port remotehost remoteport process > ------------------------------------------------------------------- > sock TCP 192.168.1.23 80 0.0.0.0 0 main > connection TCP 192.168.1.23 80 173.45.15.4 45234 main > > The parent process then takes the connection socket, and passes it on to > an available child process to handle---once the socket is passed on to the > child (and no, the three-way TCP handshake does not have to happen again, > the connected socket is passed from the parent to the child process), the > parent can then close its copy of the connection socket (which won't affect > the connection, nor the connection socket the child process now has), and go > back to handle a new connection by calling accept() on the half created > listening socket. > > var protocol localhost port remotehost remoteport process > ------------------------------------------------------------------- > sock TCP 192.168.1.23 80 0.0.0.0 0 main > connection TCP 192.168.1.23 80 173.45.15.4 45234 child > > It's the time between accept() requests that the queue limit given in the > listen() call comes into play. During the time between calls to accept(), > the Unix kernel will queue up pending connection requests (the value 5 is > the traditional value for this, but the early BSD kernels pretty much > assumed this value would always be 5, and acted oddly if it wasn't so that's > why you see this value used in much sample network code, but I digress)---it > has nothing to do with the total number of requests that can be handled, > just the number of requests that will be held between calls to accept(). > > How the socket is passed from the parent to the child will not be covered > (as it would only cloud the issue since it takes 11 pages in _Advanced > Programming in the Unix Environment_ to cover this particular issue---it's > ... messy) but just assume It Works. > > -spc (I hope this clears things up ... ) > > [1] The socket is technically an open file descriptor, which are > "duplicated" [2] during a call to fork(), and the process (parent or > child) that doesn't need access can close its copy without affecting > the other. > > [2] The file descriptor is really an index into a table of open files a > process can use. This table is maintained by the kernel (the > process can't "see" this table at all), and what's really duplicated > is this table, which contains references to other structures that > define the location of the file on disk. > y > > --------------------------------------------------------------------- > The official User-To-User support forum of the Apache HTTP Server Project. > See <URL:http://httpd.apache.org/userslist.html> for more info. > To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx > " from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx > For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx > > Kickass! Sean and Andre, thank you so much for the excellent descriptions. As someone who has done only a little C a decade ago, you've done an awesome job of explaining how Apache works at just the right level and in terms that are understandable and informative. I don't have anything to add, just wanted you all to know your effort is greatly appreciated. --------------------------------------------------------------------- The official User-To-User support forum of the Apache HTTP Server Project. See <URL:http://httpd.apache.org/userslist.html> for more info. To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx " from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx