On Thu, 17 Mar 2011 19:53:07 -0400 Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> wrote: > On Wed, 2011-02-16 at 17:15 +1100, NeilBrown wrote: > > Hi Trond, > > I wonder if I might get your help/advice on an issue with NFS. > > > > It seems that NFS_DATA_SYNC is hardly used at all currently. It is used for > > O_DIRECT writes and for writes 'for_reclaim', and for handling some error > > conditions, but that is about it. > > > > This appears to be a regression. > > > > Back in 2005, commit ab0a3dbedc5 in 2.6.13 says: > > > > [PATCH] NFS: Write optimization for short files and small O_SYNC writes. > > > > Use stable writes if we can see that we are only going to put a single > > write on the wire. > > > > which seems like a sensible optimisation, and we have a customer which > > values it. Very roughly, they have an NFS server which optimises 'unstable' > > writes for throughput and 'stable' writes for latency - these seems like a > > reasonable approach. > > With a 2.6.16 kernel an application which generates many small sync writes > > gets adequate performance. In 2.6.32 they see unstable writes followed by > > commits, which cannot be (or at least aren't) optimised as well. > > > > It seems this was changed by commit c63c7b0513953 > > > > NFS: Fix a race when doing NFS write coalescing > > > > in 2.6.22. > > > > Is it possible/easy/desirable to get this behaviour back. i.e. to use > > NFS_DATA_SYNC at least on sub-page writes triggered by a write to an > > O_SYNC file. > > > > My (possibly naive) attempt is as follows. It appears to work as I expect > > (though it still uses SYNC for 1-page writes) but I'm not confident that it > > is "right". > > > > Thanks, > > > > NeilBrown > > > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c > > index 10d648e..392bfa8 100644 > > --- a/fs/nfs/write.c > > +++ b/fs/nfs/write.c > > @@ -178,6 +178,9 @@ static int wb_priority(struct writeback_control *wbc) > > return FLUSH_HIGHPRI | FLUSH_STABLE; > > if (wbc->for_kupdate || wbc->for_background) > > return FLUSH_LOWPRI; > > + if (wbc->sync_mode == WB_SYNC_ALL && > > + (wbc->range_end - wbc->range_start) < PAGE_SIZE) > > + return FLUSH_STABLE; > > return 0; > > } > > Would it ever be wrong to set the FILE_SYNC flag for the very last rpc > call in a writeback series? I'm thinking that we might want to set > FLUSH_STABLE before the call to pageio_complete in > nfs_writepage_locked() and nfs_writepages(). Interesting idea. Am I correct in assuming you only mean if wbc->sync_mode == WB_SYNC_ALL. It wouldn't seem to make any sense for WB_SYNC_NONE. In that case that last RPC would be immediately followed by a COMMIT. So it could be reasonable to make it NFS_DATA_SYNC. However the server would be seeing something a bit odd - a sequence of unstable writes, then a stable write, then a commit. This could cause it to 'sync' things in the 'wrong' order which might be less than optimal. It would depend a lot on the particular server and filesystem of course, but it seems to be mis-communicating. So I think I would avoid this approach (assuming I understand it correctly). > > The only thing that makes me uncomfortable with that idea is the > possible repercussions for pNFS. > Cannot comment there - I don't have a deep enough understanding of the issues with pNFS. Thanks, NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html