Re: [PATCH 2/2] xfs: Call filemap_flush_range() for async xfs_flush_pages() call

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed 03-08-11 18:34:39, Vivek Goyal wrote:
> On Wed, Aug 03, 2011 at 10:49:05PM +0200, Jan Kara wrote:
> > XFS does its own data writeout from several places using xfs_flush_pages().
> > When this writeout is racing with flusher thread writing the same inode the
> > performance gets bad because flusher thread submits writes as normal WRITE
> > commands while xfs_flush_pages() submits them as WRITE_SYNC (as it uses
> > filemap_fdatawrite_range()) and CFQ does not merge such requests. So we end up
> > with relatively small requests from the flusher thread and sync interleaved.
> > Short excerpt from blktrace:
> > 254,16   2    16949   103.786278461 25233  Q   W 54479888 + 424 [flush-254:16]
> > 254,16   4    20662   103.786386751 25232  Q  WS 54492920 + 1024 [dd]
> > 254,16   4    20665   103.786444241 25232  Q  WS 54493944 + 104 [dd]
> > 254,16   2    16952   103.786533451 25233  Q   W 54494048 + 288 [flush-254:16]
> > 254,16   4    20668   103.787295241 25232  Q  WS 54494336 + 1024 [dd]
> > 254,16   4    20671   103.787332801 25232  Q  WS 54495360 + 216 [dd]
> > 254,16   2    16955   103.789062500 25233  Q   W 54495576 + 1024 [flush-254:16]
> > (actually, I was observing even smaller requests on a different HW which isn't
> > available to me now).
> 
> Do you have rest of the blktrace messages? I am curious that apart from
> smaller request size, is it CFQ idling also which might hurt in this
> IO pattern. (grep for "fired" in your blktrace and see how many times
> idle timer fired). 
  At least in the trace I have (which is sadly not from the machine where
the problem was really bad - but that was with 2.6.32 kernel anyway)
idle timer is not a problem. Apparently we are not idling because I see
messages like
254,16   1        0   103.816811760     0  m   N cfq25232 Not idling. st->count:1


> We probably are idling on "dd" thread. As long as dd writes are not dependent
> on async writes, things will still be fine as dd will continue to make
> progress and async writes will not make much progress.
> 
> But if that's not the case and after submitting bunch of synchronous
> writes dd waits for some writes to finish (writes submitted by flusher
> threds), then idling will start hurting.
  Umm, we have to wait for writes submitted by the flusher thread only when
we went through the whole file and are in fdatawait() now. So at least for
big files this won't be an issue. But thanks for your suggestions anyway.

									Honza
> > Although we cannot fix all the cases (e.g. when fsync is called), we can use
> > filemap_flush_range() for the case when xfs_flush_pages() is called with
> > XFS_B_ASYNC flag which fixes the problem at least for some common cases like
> > the first round of sync writeback or flushing of data during truncate or after
> > file is closed.
> > 
> > Signed-off-by: Jan Kara <jack@xxxxxxx>
> > ---
> >  fs/xfs/linux-2.6/xfs_fs_subr.c |    6 ++++--
> >  1 files changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/linux-2.6/xfs_fs_subr.c b/fs/xfs/linux-2.6/xfs_fs_subr.c
> > index ed88ed1..6f33164 100644
> > --- a/fs/xfs/linux-2.6/xfs_fs_subr.c
> > +++ b/fs/xfs/linux-2.6/xfs_fs_subr.c
> > @@ -70,10 +70,12 @@ xfs_flush_pages(
> >  	int		ret2;
> >  
> >  	xfs_iflags_clear(ip, XFS_ITRUNCATED);
> > +	if (flags & XBF_ASYNC) {
> > +		return -filemap_flush_range(mapping, first,
> > +				last == -1 ? LLONG_MAX : last);
> > +	}
> >  	ret = -filemap_fdatawrite_range(mapping, first,
> >  				last == -1 ? LLONG_MAX : last);
> > -	if (flags & XBF_ASYNC)
> > -		return ret;
> >  	ret2 = xfs_wait_on_pages(ip, first, last);
> >  	if (!ret)
> >  		ret = ret2;
> > -- 
> > 1.7.1
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@xxxxxxxxxxx
> > http://oss.sgi.com/mailman/listinfo/xfs
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux