After a period of uptime, chunkd may stop working with this: May 20 08:51:47 azdragon2 chunkd[4034]: tcp accept: Too many open files An examination with lsof shows that file descriptors for sockets and object data files are leaked in neat pairs. As it turns out, the root cause is not processing the case when tabled opens a connection to read an object, then closes it before the data is transferred. On some systems, sendfile returns no error in such case, but the amount of data that it attempted to send before it recognized that the socket was closed. If that happens, chunkd will not receive a POLLOUT indication and the struct cli will linger forever with non-empty write queue. The fix has two parts: 1. Permit a client in evt_recycle state to process outstanding writes in the same manner a client in evt_dispose does. Note that in our specific failure case no actual processing is going to occur, so this part has an effect of permitting the dispatch to work. If we do not do this, a POLLIN may throw us into the evt_read_fixed stage. 2. Once we're getting dispatched, dispose of clients that had connections closed, using the unmaskable POLLHUP bit. As an aside, tabled 0.5-0.7.x resets the connections when Firefox asks for a file that was modified after a certain date. In that case, tabled wants to know when the file was modified, so it reads the header off chunkd. If it turns out that the client is not interested in the data, tabled simply closes the connection without reading whatever data has arrived. This may change in the future, but the bug in chunkd should be fixed anyway, for general robustness. Signed-off-by: Pete Zaitcev <zaitcev@xxxxxxxxxx> --- server/server.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) commit a217892610de6c38453b2f63605880de43ec54af Author: Master <zaitcev@xxxxxxxxxxxxxxxxxx> Date: Thu May 20 21:19:48 2010 -0600 Fix the leak of suddenly closed connections. diff --git a/server/server.c b/server/server.c index a2dc656..07d0375 100644 --- a/server/server.c +++ b/server/server.c @@ -399,6 +399,13 @@ static bool cli_evt_dispose(struct client *cli, unsigned int events) static bool cli_evt_recycle(struct client *cli, unsigned int events) { + + /* if write queue is not empty, we should continue to get + * poll callbacks here until it is + */ + if (!list_empty(&cli->write_q)) + return false; + cli->req_ptr = &cli->creq; cli->req_used = 0; cli->state = evt_read_fixed; @@ -1303,6 +1310,12 @@ static bool tcp_cli_event(int fd, short events, void *userdata) struct client *cli = userdata; bool loop = false, disposing = false; + if (events & POLLHUP) { + cli->state = evt_dispose; + cli_free(cli); + return true; + } + if (events & POLLOUT) tcp_cli_wr_event(fd, events & ~POLLIN, userdata); -- To unsubscribe from this list: send the line "unsubscribe hail-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html