Re: [PATCH 6/6] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes

Jan Kara <jack@xxxxxxx> · Mon, 24 Oct 2011 15:14:08 +0200

On Thu 20-10-11 05:57:25, Christoph Hellwig wrote:
> On Fri, Oct 07, 2011 at 10:40:55PM +0200, Jan Kara wrote:
> > wakeup_flusher_threads(0) will queue work doing complete writeback for each
> > flusher thread. Thus there is not much point in submitting another work doing
> > full inode WB_SYNC_NONE writeback by writeback_inodes_sb().
> > 
> > After this change it does not make sense to call nonblocking ->sync_fs and
> > block device flush before calling sync_inodes_sb() because
> > wakeup_flusher_threads() is completely asynchronous and thus these functions
> > would be called in parallel with inode writeback running which will effectively
> > void any work they do. So we move sync_inodes_sb() call before these two
> > functions.
> 
> >  	wakeup_flusher_threads(0);
> > -	iterate_supers(sync_inodes_one_sb, &nowait);
> > +	iterate_supers(sync_inodes_one_sb, &wait);
> >  	iterate_supers(sync_fs_one_sb, &nowait);
> >  	sync_all_bdevs(nowait);
> > -	iterate_supers(sync_inodes_one_sb, &wait);
> >  	iterate_supers(sync_fs_one_sb, &wait);
> >  	sync_all_bdevs(wait);
> >  	if (unlikely(laptop_mode))
> 
> This looks a bit half-assed to me.  Why do we still do the non-blocking
> sync_all_bdevs call?  This really only starts async writeback on the
> block device inodes, which wakeup_flusher_threads already did.
  Because e.g. ext2 will dirty quite some bdev buffers while doing inode
writeback which runs in no particular order with the writeback of bdev
inode. So after sync_inodes_sb() finishes you can have too much dirty
buffers on a bdev for synchronous sync_all_bdevs() to be efficient.

But what might be the best is to do filemap_fdadawrite() on all bdevs and
then do filemap_fdatawait() on all bdevs which would solve the efficiency
issue and also don't do writeback twice unnecessarily. OK?

> Similarly I don't think the non-blocking ->sync_fs call really make
> much sense anymore here.
  Yes, so as I wrote in the introductory email, I did also measurements
where non-blocking ->sync_fs was removed and I didn't see any regression
with ext3, ext4, xfs, or btrfs. OTOH I can imagine *some* filesystem can do
an equivalent of filemap_fdatawrite() on some metadata for non-blocking
->sync_fs and filemap_fdatawrite_and_wait() on the blocking one and if
there are more such filesystems on different backing storages the
performance difference can be noticeable (actually, checking the
filesystems, JFS and Ceph seem to be doing something like this). So I
that's why I didn't include the change in the end...

> Also we really need some good comment on why the order is
> like the one chose here in this function.
  Good point, will add.

-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html