On Tue, Dec 22, 2009 at 08:35:39PM +0800, Jan Kara wrote: > > 2) NFS commit stops pipeline because it sleep&wait inside i_mutex, > > which blocks all other NFSDs trying to write/writeback the inode. > > > > nfsd_sync: > > take i_mutex > > filemap_fdatawrite > > filemap_fdatawait > > drop i_mutex > I believe this is unrelated to the problem Steve is trying to solve. > When we get to doing sync writes the performance is busted so we better > shouldn't get to that (unless user asked for that of course). Yes, first priority is always to reduce the COMMITs and the number of writeback pages they submitted under WB_SYNC_ALL. And I guess the "increase write chunk beyond 128MB" patches can serve it well. The i_mutex should impact NFS write performance for single big copy in this way: pdflush submits many (4MB write, 1 commit) pairs, because the write and commit each will take i_mutex, it effectively limits the server side io queue depth to <=4MB: the next 4MB dirty data won't reach page cache until the previous 4MB is completely synced to disk. There are two kinds of inefficiency here: - the small queue depth - the interleaved use of CPU/DISK: loop { write 4MB => normally only CPU writeback 4MB => mostly disk } When writing many small dirty files _plus_ one big file, there will still be interleaved write/writeback: the 4MB write will be broken into 8 NFS writes with the default wsize=524288. So there may be one nfsd doing COMMIT, another 7 nfsd waiting for the big file's i_mutex. All 8 nfsd are "busy" and pipeline is destroyed. Just a possibility. > > If filemap_fdatawait() can be moved out of i_mutex (or just remove > > the lock), we solve the root problem: > > > > nfsd_sync: > > [take i_mutex] > > filemap_fdatawrite => can also be blocked, but less a problem > > [drop i_mutex] > > filemap_fdatawait > > > > Maybe it's a dumb question, but what's the purpose of i_mutex here? > > For correctness or to prevent livelock? I can imagine some livelock > > problem here (current implementation can easily wait for extra > > pages), however not too hard to fix. > Generally, most filesystems take i_mutex during fsync to > a) avoid all sorts of livelocking problems > b) serialize fsyncs for one inode (mostly for simplicity) > I don't see what advantage would it bring that we get rid of i_mutex > for fdatawait - only that maybe writers could proceed while we are > waiting but is that really the problem? The i_mutex at least has some performance impact. Another one would be the WB_SYNC_ALL. All are related to the COMMIT/sync write behavior. Are there some other _direct_ causes? Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html