Re: [GIT PULL] bcachefs

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Tue, 27 Jun 2023 16:15:24 -0400

On Tue, Jun 27, 2023 at 11:16:01AM -0600, Jens Axboe wrote:
> On 6/26/23 8:59?PM, Jens Axboe wrote:
> > On 6/26/23 8:05?PM, Kent Overstreet wrote:
> >> On Mon, Jun 26, 2023 at 07:13:54PM -0600, Jens Axboe wrote:
> >>> Doesn't reproduce for me with XFS. The above ktest doesn't work for me
> >>> either:
> >>
> >> It just popped for me on xfs, but it took half an hour or so of looping
> >> vs. 30 seconds on bcachefs.
> > 
> > OK, I'll try and leave it running overnight and see if I can get it to
> > trigger.
> 
> I did manage to reproduce it, and also managed to get bcachefs to run
> the test. But I had to add:
> 
> diff --git a/check b/check
> index 5f9f1a6bec88..6d74bd4933bd 100755
> --- a/check
> +++ b/check
> @@ -283,7 +283,7 @@ while [ $# -gt 0 ]; do
>  	case "$1" in
>  	-\? | -h | --help) usage ;;
>  
> -	-nfs|-afs|-glusterfs|-cifs|-9p|-fuse|-virtiofs|-pvfs2|-tmpfs|-ubifs)
> +	-nfs|-afs|-glusterfs|-cifs|-9p|-fuse|-virtiofs|-pvfs2|-tmpfs|-ubifs|-bcachefs)
>  		FSTYP="${1:1}"
>  		;;
>  	-overlay)

I wonder if this is due to an upstream fstests change I haven't seen
yet, I'll have a look.

> to ktest/tests/xfstests/ and run it with -bcachefs, otherwise it kept
> failing because it assumed it was XFS.
> 
> I suspected this was just a timing issue, and it looks like that's
> exactly what it is. Looking at the test case, it'll randomly kill -9
> fsstress, and if that happens while we have io_uring IO pending, then we
> process completions inline (for a PF_EXITING current). This means they
> get pushed to fallback work, which runs out of line. If we hit that case
> AND the timing is such that it hasn't been processed yet, we'll still be
> holding a file reference under the mount point and umount will -EBUSY
> fail.
> 
> As far as I can tell, this can happen with aio as well, it's just harder
> to hit. If the fput happens while the task is exiting, then fput will
> end up being delayed through a workqueue as well. The test case assumes
> that once it's reaped the exit of the killed task that all files are
> released, which isn't necessarily true if they are done out-of-line.

Yeah, I traced it through to the delayed fput code as well.

I'm not sure delayed fput is responsible here; what I learned when I was
tracking this down has mostly fell out of my brain, so take anything I
say with a large grain of salt. But I believe I tested with delayed_fput
completely disabled, and found another thing in io_uring with the same
effect as delayed_fput that wasn't being flushed.

> For io_uring specifically, it may make sense to wait on the fallback
> work. The below patch does this, and should fix the issue. But I'm not
> fully convinced that this is really needed, as I do think this can
> happen without io_uring as well. It just doesn't right now as the test
> does buffered IO, and aio will be fully sync with buffered IO. That
> means there's either no gap where aio will hit it without O_DIRECT, or
> it's just small enough that it hasn't been hit.

I just tried your patch and I still have generic/388 failing - it
might've taken a bit longer to pop this time.

I wonder if there might be a better way of solving this though? For aio,
when a process is exiting we just synchronously tear down the ioctx,
including waiting for outstanding iocbs.

delayed_fput, even though I believe not responsible here, seems sketchy
to me because there doesn't seem to be a straightforward way to flush
delayed fputs for a given _process_ - there's a single global work item,
and we can only flush globally.

Would what aio does work here?

(disclaimer: I haven't studied the io_uring code so I haven't figured
out the approach your patch is taking yet)