Re: Odd locking pattern introduced as part of "nowait aio support"

Ritesh Harjani <riteshh@xxxxxxxxxxxxx> · Wed, 11 Sep 2019 16:01:14 +0530

Hi,

On 9/11/19 3:09 PM, Andres Freund wrote:
Hi,

On 2019-09-11 14:04:20 +1000, Dave Chinner wrote:
On Tue, Sep 10, 2019 at 03:33:27PM -0700, Andres Freund wrote:
Hi,

Especially with buffered io it's fairly easy to hit contention on the
inode lock, during writes. With something like io_uring, it's even
easier, because it currently (but see [1]) farms out buffered writes to
workers, which then can easily contend on the inode lock, even if only
one process submits writes.  But I've seen it in plenty other cases too.

Looking at the code I noticed that several parts of the "nowait aio
support" (cf 728fbc0e10b7f3) series introduced code like:

static ssize_t
ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
...
	if (!inode_trylock(inode)) {
		if (iocb->ki_flags & IOCB_NOWAIT)
			return -EAGAIN;
		inode_lock(inode);
	}

The ext4 code is just buggy here - we don't support RWF_NOWAIT on
buffered write >
But both buffered and non-buffered writes go through
ext4_file_write_iter(). And there's a preceding check, at least these
days, preventing IOCB_NOWAIT to apply to buffered writes:

	if (!o_direct && (iocb->ki_flags & IOCB_NOWAIT))
		return -EOPNOTSUPP;

-EOPNOTSUPP is now taken care in ext4 iomap patch series as well.

I do really wish buffered NOWAIT was supported... The overhead of having
to do async buffered writes through the workqueue in io_uring, even if
an already existing page is targeted, is quite noticable. Even if it
failed with EAGAIN as soon as the buffered write's target isn't in the
page cache, it'd be worthwhile.

isn't trylocking and then locking in a blocking fashion an inefficient
pattern? I.e. I think this should be

	if (iocb->ki_flags & IOCB_NOWAIT) {
		if (!inode_trylock(inode))
			return -EAGAIN;
	}
         else
         	inode_lock(inode);

Yes, you are right.

History: commit 91f9943e1c7b ("fs: support RWF_NOWAIT
for buffered reads") which introduced the first locking pattern
you describe in XFS.

Seems, as part of the nowait work, the pattern was introduced in ext4,
xfs and btrfs. And fixed in xfs.

That was followed soon after by:

commit 942491c9e6d631c012f3c4ea8e7777b0b02edeab
Author: Christoph Hellwig <hch@xxxxxx>
Date:   Mon Oct 23 18:31:50 2017 -0700

     xfs: fix AIM7 regression

     Apparently our current rwsem code doesn't like doing the trylock, then
     lock for real scheme.  So change our read/write methods to just do the
     trylock for the RWF_NOWAIT case.  This fixes a ~25% regression in
     AIM7.

     Fixes: 91f9943e ("fs: support RWF_NOWAIT for buffered reads")
     Reported-by: kernel test robot <xiaolong.ye@xxxxxxxxx>
     Signed-off-by: Christoph Hellwig <hch@xxxxxx>
     Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
     Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>

Which changed all the trylock/eagain/lock patterns to the second
form you quote. None of the other filesystems had AIM7 regressions
reported against them, so nobody changed them....

:(

Obviously this isn't going to improve scalability to a very significant
degree. But not unnecessarily doing two atomic ops on a contended lock
can't hurt scalability either. Also, the current code just seems
confusing.

Am I missing something?

Just that the sort of performance regression testing that uncovers
this sort of thing isn't widely done, and most filesystems are
concurrency limited in some way before they hit inode lock
scalability issues. Hence filesystem concurrency foccussed
benchmarks that could uncover it (like aim7) won't because the inode
locks don't end up stressed enough to make a difference to
benchmark performance.

Ok.  Goldwyn, do you want to write a patch, or do you want me to write
one up?

I am anyways looking into ext4 performance issue of mixed parallel DIO 
workload. This will require some new APIs for inode locking similar to
that of XFS.
In that I can take care of this symantics reported here by you (which is 
taken care by XFS in above patch) for ext4.

Thanks
-ritesh

Greetings,

Andres Freund