Re: [PATCH 14/23] generic/032: fix pinned mount failure

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 21 Jan 2025 16:03:23 +1100

On Thu, Jan 16, 2025 at 03:28:49PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@xxxxxxxxxx>
> 
> generic/032 now periodically fails with:
> 
>  --- /tmp/fstests/tests/generic/032.out	2025-01-05 11:42:14.427388698 -0800
>  +++ /var/tmp/fstests/generic/032.out.bad	2025-01-06 18:20:17.122818195 -0800
>  @@ -1,5 +1,7 @@
>   QA output created by 032
>   100 iterations
>  -000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  >................<
>  -*
>  -100000
>  +umount: /opt: target is busy.
>  +mount: /opt: /dev/sda4 already mounted on /opt.
>  +       dmesg(1) may have more information after failed mount system call.
>  +cycle mount failed
>  +(see /var/tmp/fstests/generic/032.full for details)
> 
> The root cause of this regression is the _syncloop subshell.  This
> background process runs _scratch_sync, which is actually an xfs_io
> process that calls syncfs on the scratch mount.
> 
> Unfortunately, while the test kills the _syncloop subshell, it doesn't
> actually kill the xfs_io process.  If the xfs_io process is in D state
> running the syncfs, it won't react to the signal, but it will pin the
> mount.  Then the _scratch_cycle_mount fails because the mount is pinned.
> 
> Prior to commit 8973af00ec212f the _syncloop ran sync(1) which avoided
> pinning the scratch filesystem.

How does running sync(1) prevent this? they run the same kernel
code, so I'm a little confused as to why this is a problem caused
by using the syncfs() syscall rather than the sync() syscall...

> Fix this by pgrepping for the xfs_io process and killing and waiting for
> it if necessary.

Change looks fine, though.

Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>

-- 
Dave Chinner
david@xxxxxxxxxxxxx