On Sat, Jun 21, 2008 at 09:21:42AM -0500, Jody French (jfrench@xxxxxxxxxxxxx) wrote: > In the particular case we are looking at, the network stack (TCP perhaps > due a temporary glitch in > the network adapter or routing infrastructure or temporary memory > pressure) is returning EAGAIN > for more than 15 seconds (on the tcp send of the Write request) but the > server itself has not crashed, > (subsequent parts of the file written via later writepages requests are > eventually written out), eventually > we give up in writepages and return EIO on the next fsync or flush/close > - but if we could > make one more attempt to go through in flush, and write all dirty pages > including the ones that we timed > out on that would help. In addition if readpage is about to do a > partial page read into a dirty page that > we were unable to write out we would like to try once more before > corrupting the data. If you do not unlock and release the page, nothing can currupt it, but I'm not sure that if you had 15 seconds timeout writepage+flush will have enough time interval to exceed it. In the flush you can switch to nonblocking mode and set socket timeout to 30 seconds for example and if even that failed, then discard data. EGAIN likely means problem on server, which I referred as non serious, and likely it will resume in a few moments, so your trick with write+flush can work, but only with long enouf timeout. Actually you can always perform similar trick with socket timeout, but bevare of problems with umount or sync, when they can take toooo long to complete, so there should be some flag to show when you want and do not want it. Similar scheme was implemented in POHMELFS. Returning error I think is the last thing to do and whatever retry mechanism you will decide to implement it worth the efforts. -- Evgeniy Polyakov -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html