Re: [PATCH] loop: Don't change loop device under exclusive opener

Jens Axboe <axboe@xxxxxxxxx> · Thu, 8 Aug 2019 06:37:31 -0700

On 8/7/19 2:45 AM, Jan Kara wrote:
> On Mon 05-08-19 09:41:39, Bart Van Assche wrote:
>> On 7/30/19 6:36 AM, Jan Kara wrote:
>>> On Tue 30-07-19 12:16:46, Jan Kara wrote:
>>>> On Tue 30-07-19 10:36:59, John Lenton wrote:
>>>>> On Tue, 30 Jul 2019 at 10:29, Jan Kara <jack@xxxxxxx> wrote:
>>>>>>
>>>>>> Thanks for the notice and the references. What's your version of
>>>>>> util-linux? What your test script does is indeed racy. You have there:
>>>>>>
>>>>>> echo Running:
>>>>>> for i in {a..z}{a..z}; do
>>>>>>       mount $i.squash /mnt/$i &
>>>>>> done
>>>>>>
>>>>>> So all mount(8) commands will run in parallel and race to setup loop
>>>>>> devices with LOOP_SET_FD and mount them. However util-linux (at least in
>>>>>> the current version) seems to handle EBUSY from LOOP_SET_FD just fine and
>>>>>> retries with the new loop device. So at this point I don't see why the patch
>>>>>> makes difference... I guess I'll need to reproduce and see what's going on
>>>>>> in detail.
>>>>>
>>>>> We've observed this in arch with util-linux 2.34, and ubuntu 19.10
>>>>> (eoan ermine) with util-linux 2.33.
>>>>>
>>>>> just to be clear, the initial reports didn't involve a zany loop of
>>>>> mounts, but were triggered by effectively the same thing as systemd
>>>>> booted a system with a lot of snaps. The reroducer tries to makes
>>>>> things simpler to reproduce :-). FWIW,  systemd versions were 244 and
>>>>> 242 for those systems, respectively.
>>>>
>>>> Thanks for info! So I think I see what's going on. The two mounts race
>>>> like:
>>>>
>>>> MOUNT1					MOUNT2
>>>> num = ioctl(LOOP_CTL_GET_FREE)
>>>> 					num = ioctl(LOOP_CTL_GET_FREE)
>>>> ioctl("/dev/loop$num", LOOP_SET_FD, ..)
>>>>    - returns OK
>>>> 					ioctl("/dev/loop$num", LOOP_SET_FD, ..)
>>>> 					  - acquires exclusine loop$num
>>>> 					    reference
>>>> mount("/dev/loop$num", ...)
>>>>    - sees exclusive reference from MOUNT2 and fails
>>>> 					  - sees loop device is already
>>>> 					    bound and fails
>>>>
>>>> It is a bug in the scheme I've chosen that racing LOOP_SET_FD can block
>>>> perfectly valid mount. I'll think how to fix this...
>>>
>>> So how about attached patch? It fixes the regression for me.
>   
> Hi Bart,
> 
>> A new kernel warning is triggered by blktests block/001 that did not happen
>> without this patch. Reverting commit 89e524c04fa9 ("loop: Fix mount(2)
>> failure due to race with LOOP_SET_FD") makes that kernel warning disappear.
>> Is this reproducible on your setup?
> 
> Thanks for report! Hum, no, it seems the warning doesn't trigger in my test
> VM. But reviewing the mentioned commit with fresh head, I can see where I
> did a mistake during my conversion of blkdev_get(). Does attached patch fix
> the warning for you?

I've queued this up, Jan. Thanks for taking a look at it.

-- 
Jens Axboe