On 05/21/2010 12:54 AM, Pete Zaitcev wrote:
After a period of uptime, chunkd may stop working with this: May 20 08:51:47 azdragon2 chunkd[4034]: tcp accept: Too many open files An examination with lsof shows that file descriptors for sockets and object data files are leaked in neat pairs. As it turns out, the root cause is not processing the case when tabled opens a connection to read an object, then closes it before the data is transferred. On some systems, sendfile returns no error in such case, but the amount of data that it attempted to send before it recognized that the socket was closed. If that happens, chunkd will not receive a POLLOUT indication and the struct cli will linger forever with non-empty write queue. The fix has two parts: 1. Permit a client in evt_recycle state to process outstanding writes in the same manner a client in evt_dispose does. Note that in our specific failure case no actual processing is going to occur, so this part has an effect of permitting the dispatch to work. If we do not do this, a POLLIN may throw us into the evt_read_fixed stage. 2. Once we're getting dispatched, dispose of clients that had connections closed, using the unmaskable POLLHUP bit. As an aside, tabled 0.5-0.7.x resets the connections when Firefox asks for a file that was modified after a certain date. In that case, tabled wants to know when the file was modified, so it reads the header off chunkd. If it turns out that the client is not interested in the data, tabled simply closes the connection without reading whatever data has arrived. This may change in the future, but the bug in chunkd should be fixed anyway, for general robustness. Signed-off-by: Pete Zaitcev<zaitcev@xxxxxxxxxx>
applied 1-6, after fixing truncation bug newly introduced -- To unsubscribe from this list: send the line "unsubscribe hail-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html