Trond Myklebust wrote: > On Tue, 2009-06-09 at 18:32 -0400, Peter Staubach wrote: > >> I still need to move this along. >> > > Sorry, it has been a long week at home (state championships, > graduation...). > > State championships? How did they go? > I did promise to send a dump of the state of the fstatat() stuff from > LSF (see attachments). > > Thanx! Seems fairly straightforward. > As for the patch you posted, I did have comments that haven't really > been addressed. As I said, I certainly don't see the need to have > write() wait for writebacks to complete. I also don't accept that we > need to treat random writes as fundamentally different from serial > writes. > Sorry about not addressing your comments adequately. Are you refering to nfs_wait_for_outstanding_writes() or do you see someplace else that write() is waiting for writebacks to complete? Perhaps I should have named it nfs_wait_for_too_many_outstanding_writes()? :-) That certainly was not the intention. The intention was to have the pages gathered and then the over the wire stuff handled asynchronously. If this is not true, then I need to do some more work. A goal of this work is attempt to better match the bandwidth offered by the network/server/storage with the rate at which applications can create dirty pages. It is not good for the application to get too far ahead and too many pages dirtied. This leads to the current problem with stat() as well as much nastier out of memory conditions. If the system is not capable of cleaning more than N GB/second, then it doesn't make sense for applications to dirty more than that same N GB/second. In the end, they won't be able to do that anyway, so why tie up the memory, possibly causing problems? I see random access as being different than sequential mostly due to the expectations that the different style applications have. Applications which access a file sequentially typically do not expect to access the pages again after either reading them or writing them. This does not mean that we should toss them from the page cache, but it does mean that we can start writing them because the chances of the application returning to update the contents of the pages is minimized and the pages will need to get written anyway. Applications that use random access patterns are much more likely to return to existing pages and modify them for a second time. Proactively writing these pages means that multiple over the wire writes would be required when fewer over the wire writes would have actually been required by waiting. > I'm currently inclining towards adding a switch to turn off strict posix > behaviour. There weren't too many people asking for it earlier, and > there aren't that many applications out there that are sensitive to the > exact mtime. Samba and backup applications are the major exceptions to > that rule, but you don't really run those on top of NFS clients if you > can avoid it... While I think that this switch is an okay idea and will help some applications which get modified to use it, it does not help existing applications or applications which want the correct time values and also reasonable performance. I believe that we can help all applications by reviewing the page cache handling architecture for the NFS client. Thanx... ps -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html