Re: [PATCH v3 1/5] task_work: export task_work_add()

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Tue, 25 Jan 2022 13:37:09 -0800

On Fri, Jan 21, 2022 at 08:40:02PM +0900, Tetsuo Handa wrote:
> Commit 322c4293ecc58110 ("loop: make autoclear operation asynchronous")
> silenced a circular locking dependency warning by moving autoclear
> operation to WQ context.
> 
> Then, it was reported that WQ context is too late to run autoclear
> operation; some userspace programs (e.g. xfstest) assume that the autoclear
> operation already completed by the moment close() returns to user mode
> so that they can immediately call umount() of a partition containing a
> backing file which the autoclear operation should have closed.
> 
> Then, Jan Kara found that fundamental problem is that waiting for I/O
> completion (from blk_mq_freeze_queue() or flush_workqueue()) with
> disk->open_mutex held has possibility of deadlock.
> 
> Then, I found that since disk->open_mutex => lo->lo_mutex dependency is
> recorded by lo_open() and lo_release(), and blk_mq_freeze_queue() by e.g.
> loop_set_status() waits for I/O completion with lo->lo_mutex held, from
> locking dependency chain perspective we need to avoid holding lo->lo_mutex
>  from lo_open() and lo_release(). And we can avoid holding lo->lo_mutex
>  from lo_open(), for we can instead use a spinlock dedicated for
> Lo_deleting check.
> 
> But we cannot avoid holding lo->lo_mutex from lo_release(), for WQ context
> was too late to run autoclear operation. We need to make whole lo_release()
> operation start without disk->open_mutex and complete before returning to
> user mode. One of approaches that can meet such requirement is to use the
> task_work context. Thus, export task_work_add() for the loop driver.
> 
> Cc: Jan Kara <jack@xxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxx>
> Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>

5.17-rc1 came out and I saw regressions across the board when xfs/049
and xfs/073 ran.

xfs/049 formats XFS on a block device, mounts it, creates a sparse file
inside the XFS fs, formats the sparse file, mounts that (via automatic
loop device), does some work, and then unmounts both the sparse file
filesystem and the outer XFS filesystem.

The first unmount no longer waited for the loop device to release
asynchronously so the unmount of the outer fs fails because it's still
in use.

So this series fixes xfs/049, but xfs/073 is still broken.  xfs/073
creates a sparse file containing an XFS filesystem and then does this in
rapid succession:

mount -o loop <mount options that guarantee mount failure>
mount -o loop <mount options that should work>

Whereas with 5.16 this worked fine,

 [U] try mount 1
 loop0: detected capacity change from 0 to 83968
 XFS (loop0): Filesystem has duplicate UUID 924e8033-a130-4f9c-a11f-52f892c268e9 - can't mount
 [U] try mount 2
 loop0: detected capacity change from 0 to 83968
 XFS (loop0): Mounting V5 Filesystem
 XFS (loop0): resetting quota flags
 XFS (loop0): Ending clean mount

in 5.17-rc1 it fails like this:

 [U] try mount 1
 loop0: detected capacity change from 0 to 83968
 XFS (loop0): Filesystem has duplicate UUID 0b0afdac-5c9c-4d94-9b8d-fe85a2eb1143 - can't mount
 [U] try mount 2
 I/O error, dev loop0, sector 0 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0
 XFS (loop0): SB validate failed with error -5.
 [U] fail mount 2

I guess this means that mount can grab the loop device after the backing
file has been released but before a new one has been plumbed in?  Or,
seeing the lack of "detected capacity change" for mount 2, maybe the
backing file never gets added?

--D

> ---
>  kernel/task_work.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/task_work.c b/kernel/task_work.c
> index 1698fbe6f0e1..2a1644189182 100644
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -60,6 +60,7 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
>  
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(task_work_add);
>  
>  /**
>   * task_work_cancel_match - cancel a pending work added by task_work_add()
> -- 
> 2.32.0
>