Re: [PATCH] loop: Don't change loop device under exclusive opener

Jan Kara <jack@xxxxxxx> · Wed, 7 Aug 2019 11:45:20 +0200

On Mon 05-08-19 09:41:39, Bart Van Assche wrote:
> On 7/30/19 6:36 AM, Jan Kara wrote:
> > On Tue 30-07-19 12:16:46, Jan Kara wrote:
> > > On Tue 30-07-19 10:36:59, John Lenton wrote:
> > > > On Tue, 30 Jul 2019 at 10:29, Jan Kara <jack@xxxxxxx> wrote:
> > > > > 
> > > > > Thanks for the notice and the references. What's your version of
> > > > > util-linux? What your test script does is indeed racy. You have there:
> > > > > 
> > > > > echo Running:
> > > > > for i in {a..z}{a..z}; do
> > > > >      mount $i.squash /mnt/$i &
> > > > > done
> > > > > 
> > > > > So all mount(8) commands will run in parallel and race to setup loop
> > > > > devices with LOOP_SET_FD and mount them. However util-linux (at least in
> > > > > the current version) seems to handle EBUSY from LOOP_SET_FD just fine and
> > > > > retries with the new loop device. So at this point I don't see why the patch
> > > > > makes difference... I guess I'll need to reproduce and see what's going on
> > > > > in detail.
> > > > 
> > > > We've observed this in arch with util-linux 2.34, and ubuntu 19.10
> > > > (eoan ermine) with util-linux 2.33.
> > > > 
> > > > just to be clear, the initial reports didn't involve a zany loop of
> > > > mounts, but were triggered by effectively the same thing as systemd
> > > > booted a system with a lot of snaps. The reroducer tries to makes
> > > > things simpler to reproduce :-). FWIW,  systemd versions were 244 and
> > > > 242 for those systems, respectively.
> > > 
> > > Thanks for info! So I think I see what's going on. The two mounts race
> > > like:
> > > 
> > > MOUNT1					MOUNT2
> > > num = ioctl(LOOP_CTL_GET_FREE)
> > > 					num = ioctl(LOOP_CTL_GET_FREE)
> > > ioctl("/dev/loop$num", LOOP_SET_FD, ..)
> > >   - returns OK
> > > 					ioctl("/dev/loop$num", LOOP_SET_FD, ..)
> > > 					  - acquires exclusine loop$num
> > > 					    reference
> > > mount("/dev/loop$num", ...)
> > >   - sees exclusive reference from MOUNT2 and fails
> > > 					  - sees loop device is already
> > > 					    bound and fails
> > > 
> > > It is a bug in the scheme I've chosen that racing LOOP_SET_FD can block
> > > perfectly valid mount. I'll think how to fix this...
> > 
> > So how about attached patch? It fixes the regression for me.
 
Hi Bart,

> A new kernel warning is triggered by blktests block/001 that did not happen
> without this patch. Reverting commit 89e524c04fa9 ("loop: Fix mount(2)
> failure due to race with LOOP_SET_FD") makes that kernel warning disappear.
> Is this reproducible on your setup?

Thanks for report! Hum, no, it seems the warning doesn't trigger in my test
VM. But reviewing the mentioned commit with fresh head, I can see where I
did a mistake during my conversion of blkdev_get(). Does attached patch fix
the warning for you?

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR
>From c4cd39244088d8de548264bc33dc9fb8f0f1db2d Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@xxxxxxx>
Date: Wed, 7 Aug 2019 11:36:47 +0200
Subject: [PATCH] bdev: Fixup error handling in blkdev_get()

Commit 89e524c04fa9 ("loop: Fix mount(2) failure due to race with
LOOP_SET_FD") converted blkdev_get() to use the new helpers for
finishing claiming of a block device. However the conversion botched the
error handling in blkdev_get() and thus the bdev has been marked as held
even in case __blkdev_get() returned error. This led to occasional
warnings with block/001 test from blktests like:

kernel: WARNING: CPU: 5 PID: 907 at fs/block_dev.c:1899 __blkdev_put+0x396/0x3a0

Correct the error handling.

CC: stable@xxxxxxxxxxxxxxx
Fixes: 89e524c04fa9 ("loop: Fix mount(2) failure due to race with LOOP_SET_FD")
Signed-off-by: Jan Kara <jack@xxxxxxx>
---
 fs/block_dev.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index a6f7c892cb4a..7db181581606 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1754,7 +1754,10 @@ int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder)
 
 		/* finish claiming */
 		mutex_lock(&bdev->bd_mutex);
-		bd_finish_claiming(bdev, whole, holder);
+		if (!res)
+			bd_finish_claiming(bdev, whole, holder);
+		else
+			bd_abort_claiming(bdev, whole, holder);
 		/*
 		 * Block event polling for write claims if requested.  Any
 		 * write holder makes the write_holder state stick until
-- 
2.16.4