Re: [PATCH] xfs: Don't flush inodes when project quota exceeded

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 21 Nov 2012 12:38:21 +1100

On Wed, Nov 21, 2012 at 01:24:59AM +0100, Jan Kara wrote:
> On Tue 20-11-12 11:04:28, Dave Chinner wrote:
> > On Mon, Nov 19, 2012 at 10:39:13PM +0100, Jan Kara wrote:
> > > On Tue 13-11-12 01:36:13, Jan Kara wrote:
> > > > When project quota gets exceeded xfs_iomap_write_delay() ends up flushing
> > > > inodes because ENOSPC gets returned from xfs_bmapi_delay() instead of EDQUOT.
> > > > This makes handling of writes over project quota rather slow as a simple test
> > > > program shows:
> > > > 	fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC, 0644);
> > > > 	for (i = 0; i < 50000; i++)
> > > > 		pwrite(fd, buf, 4096, i*4096);
> > > > 
> > > > Writing 200 MB like this into a directory with 100 MB project quota takes
> > > > around 6 minutes while it takes about 2 seconds with this patch applied. This
> > > > actually happens in a real world load when nfs pushes data into a directory
> > > > which is over project quota.
> > > > 
> > > > Fix the problem by replacing XFS_QMOPT_ENOSPC flag with XFS_QMOPT_EPDQUOT.
> > > > That makes xfs_trans_reserve_quota_bydquots() return new error EPDQUOT when
> > > > project quota is exceeded. xfs_bmapi_delay() then uses this flag so that
> > > > xfs_iomap_write_delay() can distinguish real ENOSPC (requiring flushing)
> > > > from exceeded project quota (not requiring flushing).
> > > > 
> > > > As a side effect this patch fixes inconsistency where e.g. xfs_create()
> > > > returned EDQUOT even when project quota was exceeded.
> > >   Ping? Any opinions?
> > 
> > FWIW, it doesn't look like it'll apply to a current XFs tree:
> > 
> > > > @@ -441,8 +442,11 @@ retry:
> > > >  	 */
> > > >  	if (nimaps == 0) {
> > > >  		trace_xfs_delalloc_enospc(ip, offset, count);
> > > > -		if (flushed)
> > > > -			return XFS_ERROR(error ? error : ENOSPC);
> > > > +		if (flushed) {
> > > > +			if (error == 0 || error == EPDQUOT)
> > > > +				error = ENOSPC;
> > > > +			return XFS_ERROR(error);
> > > > +		}
> > > >  
> > > >  		if (error == ENOSPC) {
> > > >  			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > 
> > This xfs_iomap_write_delay() looks like this now:
> > 
> >         /*
> >          * If bmapi returned us nothing, we got either ENOSPC or EDQUOT. Retry
> >          * without EOF preallocation.
> >          */
> >         if (nimaps == 0) {
> >                 trace_xfs_delalloc_enospc(ip, offset, count);
> >                 if (prealloc) {
> >                         prealloc = 0;
> >                         error = 0;
> >                         goto retry;
> >                 }
> >                 return XFS_ERROR(error ? error : ENOSPC);
> >         }
> > 
> > The flushing is now way up in xfs_file_buffered_aio_write(), and the
> > implementation of xfs_flush_inodes() has changed as well. Hence it
> > may or may not behave differently not....
>   OK, so I tested latest XFS tree and changes by commit 9aa05000 (changing
> xfs_flush_inodes()) indeed improve the performace from those ~6 minutes to
> ~6 seconds which is good enough I believe. Thanks for the pointer! I was
> thinking for a while why sync_inodes_sb() is so much faster than the
> original XFS implementation and I believe it's because we don't force the
> log on each sync now.

I think it's detter because we now don't scan every inode in the
inode cache doing a mapping_tagged(PAGECACHE_TAG_DIRTY) check to see
if they are dirty or not to decide whether it needs writeback. The
overhead of doing that adds up very quickly when you have lots of
cached inodes and you are scanning them for every page that is
written to....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs