Re: [loop] 322c4293ec: xfstests.xfs.049.fail

Jan Kara <jack@xxxxxxx> · Mon, 20 Dec 2021 12:58:23 +0100

On Mon 20-12-21 00:45:46, Tetsuo Handa wrote:
> On 2021/12/20 0:09, kernel test robot wrote:
> >     @@ -13,3 +13,5 @@
> >      --- clean
> >      --- umount ext2 on xfs
> >      --- umount xfs
> >     +!!! umount xfs failed
> >     +(see /lkp/benchmarks/xfstests/results//xfs/049.full for details)
> >     ...
> >     (Run 'diff -u /lkp/benchmarks/xfstests/tests/xfs/049.out /lkp/benchmarks/xfstests/results//xfs/049.out.bad'  to see the entire diff)
> 
> Yes, we know this race condition can happen.
> 
> https://lkml.kernel.org/r/16c7d304-60ef-103f-1b2c-8592b48f47c6@xxxxxxxxxxxxxxxxxxx
> https://lkml.kernel.org/r/YaYfu0H2k0PSQL6W@xxxxxxxxxxxxx
> 
> Should we try to wait for autoclear operation to complete?

So I think we should try to fix this because as Dave writes in the
changelog for a1ecac3b0656 ("loop: Make explicit loop device destruction
lazy") which started all this, having random EBUSY failures (either from
losetup or umount) is annoying and you need to work it around it lots of
unexpected places.

We cannot easily wait for work completion in the loop device code without
reintroducing the deadlock - whole lo_release() is called under
disk->open_mutex which you also need to grab in __loop_clr_fd(). So to
avoid holding backing file busy longer than expected, we could use
task_work instead of ordinary work as I suggested - but you were right that
we need to be somewhat careful and in case we are running in a kthread, we
would still need to offload to a normal work (but in that case we don't
care about delaying file release anyway).

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR