On Wed 29-06-11 13:55:34, Christoph Hellwig wrote: > On Wed, Jun 29, 2011 at 06:57:14PM +0200, Jan Kara wrote: > > > For sys_sync I'm pretty sure we could simply remove the > > > writeback_inodes_sb call and get just as good if not better performance, > > Actually, it won't with current code. Because WB_SYNC_ALL writeback > > currently has the peculiarity that it looks like: > > for all inodes { > > write all inode data > > wait for inode data > > } > > while to achieve good performance we actually need something like > > for all inodes > > write all inode data > > for all inodes > > wait for inode data > > It makes a difference in an order of magnitude when there are lots of > > smallish files - SLES had a bug like this so I know from user reports ;) > > I don't think that's true. The WB_SYNC_ALL writeback is done using > sync_inodes_sb, which operates as: > > for all dirty inodes in bdi: > if inode belongs to sb > write all inode data > > for all inodes in sb: > wait for inode data > > we still do that in a big for each sb loop, though. True but writeback_single_inode() has in it: if (wbc->sync_mode == WB_SYNC_ALL) { int err = filemap_fdatawait(mapping); if (ret == 0) ret = err; } So we end up waiting much earlier. Probably we should remove this wait but that will need some audit I guess. > > You mean that sync(1) would actually write the data itself? It would > > certainly make some things simpler but it has its problems as well - for > > example sync racing with flusher thread writing back inodes can create > > rather bad IO pattern... > > Only the second pass. The idea is that we first try to use the flusher > threads for good I/O patterns, but if we can't get that to work only > block the caller and not everyone. But that's just an idea so far, > it would need serious benchmark. And despite what I claimed before > we actually do the wait in the caller context already anyway, which > already gives you the easy part of the above effect. Modulo the writeback_single_inode() wait. But if that is dealt with I agree. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html