On Wed 21-11-12 12:38:21, Dave Chinner wrote: > On Wed, Nov 21, 2012 at 01:24:59AM +0100, Jan Kara wrote: > > On Tue 20-11-12 11:04:28, Dave Chinner wrote: > > > On Mon, Nov 19, 2012 at 10:39:13PM +0100, Jan Kara wrote: > > > > On Tue 13-11-12 01:36:13, Jan Kara wrote: > > > > > When project quota gets exceeded xfs_iomap_write_delay() ends up flushing > > > > > inodes because ENOSPC gets returned from xfs_bmapi_delay() instead of EDQUOT. > > > > > This makes handling of writes over project quota rather slow as a simple test > > > > > program shows: > > > > > fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC, 0644); > > > > > for (i = 0; i < 50000; i++) > > > > > pwrite(fd, buf, 4096, i*4096); > > > > > > > > > > Writing 200 MB like this into a directory with 100 MB project quota takes > > > > > around 6 minutes while it takes about 2 seconds with this patch applied. This > > > > > actually happens in a real world load when nfs pushes data into a directory > > > > > which is over project quota. > > > > > > > > > > Fix the problem by replacing XFS_QMOPT_ENOSPC flag with XFS_QMOPT_EPDQUOT. > > > > > That makes xfs_trans_reserve_quota_bydquots() return new error EPDQUOT when > > > > > project quota is exceeded. xfs_bmapi_delay() then uses this flag so that > > > > > xfs_iomap_write_delay() can distinguish real ENOSPC (requiring flushing) > > > > > from exceeded project quota (not requiring flushing). > > > > > > > > > > As a side effect this patch fixes inconsistency where e.g. xfs_create() > > > > > returned EDQUOT even when project quota was exceeded. > > > > Ping? Any opinions? > > > > > > FWIW, it doesn't look like it'll apply to a current XFs tree: > > > > > > > > @@ -441,8 +442,11 @@ retry: > > > > > */ > > > > > if (nimaps == 0) { > > > > > trace_xfs_delalloc_enospc(ip, offset, count); > > > > > - if (flushed) > > > > > - return XFS_ERROR(error ? error : ENOSPC); > > > > > + if (flushed) { > > > > > + if (error == 0 || error == EPDQUOT) > > > > > + error = ENOSPC; > > > > > + return XFS_ERROR(error); > > > > > + } > > > > > > > > > > if (error == ENOSPC) { > > > > > xfs_iunlock(ip, XFS_ILOCK_EXCL); > > > > > > This xfs_iomap_write_delay() looks like this now: > > > > > > /* > > > * If bmapi returned us nothing, we got either ENOSPC or EDQUOT. Retry > > > * without EOF preallocation. > > > */ > > > if (nimaps == 0) { > > > trace_xfs_delalloc_enospc(ip, offset, count); > > > if (prealloc) { > > > prealloc = 0; > > > error = 0; > > > goto retry; > > > } > > > return XFS_ERROR(error ? error : ENOSPC); > > > } > > > > > > The flushing is now way up in xfs_file_buffered_aio_write(), and the > > > implementation of xfs_flush_inodes() has changed as well. Hence it > > > may or may not behave differently not.... > > OK, so I tested latest XFS tree and changes by commit 9aa05000 (changing > > xfs_flush_inodes()) indeed improve the performace from those ~6 minutes to > > ~6 seconds which is good enough I believe. Thanks for the pointer! I was > > thinking for a while why sync_inodes_sb() is so much faster than the > > original XFS implementation and I believe it's because we don't force the > > log on each sync now. > > I think it's detter because we now don't scan every inode in the > inode cache doing a mapping_tagged(PAGECACHE_TAG_DIRTY) check to see > if they are dirty or not to decide whether it needs writeback. The > overhead of doing that adds up very quickly when you have lots of > cached inodes and you are scanning them for every page that is > written to.... That may be for other workloads but in my case I had a test case where I'm pretty sure only a couple of inodes were in the cache (I just mounted a filesystem and beated one file in one directory on it). Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs