Re: [PATCH v2 2/2] loop: use task_work for autoclear operation

Jan Kara <jack@xxxxxxx> · Mon, 10 Jan 2022 14:42:34 +0100

On Mon 10-01-22 20:28:28, Tetsuo Handa wrote:
> On 2022/01/10 19:30, Jan Kara wrote:
> > Eek, I agree that the patch may improve the situation but is the complexity
> > and ugliness really worth it?
> 
> If we are clear about
> 
>   Now your patch will fix this lockdep complaint but we
>   still would wait for the write to complete through blk_mq_freeze_queue()
>   (just lockdep is not clever enough to detect this). So IHMO if there was a
>   deadlock before, it will be still there with your changes.
> 
> in https://lkml.kernel.org/r/20211223134050.GD19129@xxxxxxxxxxxxxx ,
> we can go with the revert approach.
> 
> I want to call blk_mq_freeze_queue() without disk->open_mutex held.
> But currently lo_release() is calling blk_mq_freeze_queue() with
> disk->open_mutex held. My patch is going towards doing locklessly
> where possible.

I see. But:
a) We didn't fully establish a real deadlock scenario from the lockdep
report, did we? The lockdep report involved suspend locks, some locks on
accessing files in /proc etc. and it was not clear whether it all reflects
a real deadlock possibility or just a fact that lockdep tracking is rather
coarse-grained at times. Now lockdep report is unpleasant and loop device
locking was ugly anyway so your async change made sense but once lockdep is
silenced we should really establish whether there is real deadlock and more
work is needed or not.

b) Unless we have a realistic plan of moving *all* blk_mq_freeze_queue()
calls from under open_mutex in loop driver, moving stuff "where possible"
from under open_mutex is IMO just muddying water.

> >                               I mean using task work in
> > loop_schedule_rundown() makes a lot of sense because the loop
> > 
> > while :; do mount -o loop,ro isofs.iso isofs/; umount isofs/; done
> > 
> > will not fail because of dangling work in the workqueue after umount ->
> > __loop_clr_fd().
> 
> Using task work from lo_release() is for handling close() => umount() sequence.
> Using task work in lo_open() is for handling close() => open() sequence.
> The mount in
> 
>   while :; do mount -o loop,ro isofs.iso isofs/; umount isofs/; done
> 
> fails unless lo_open() waits for __loop_clr_fd() to complete.

Why? If we use task work, we are guaranteed the loop device is cleaned up
before umount returns unless some other process also has the loop device
open.

> >                  But when other processes like systemd-udevd start to mess
> > with the loop device, then you have no control whether the following mount
> > will see isofs.iso busy or not - it depends on when systemd-udevd decides
> > to close the loop device.
> 
> Right. But regarding that part the main problem is that the script is not
> checking for errors. What we are expected to do is to restore barrier
> which existed before commit 322c4293ecc58110 ("loop: make autoclear
> operation asynchronous").

OK, can you explain what you exactly mean by the barrier? Because it my
understanding your patch only makes one race somewhat less likely.

> 
> >                           What your waiting in lo_open() achieves is only
> > that if __loop_clr_fd() from systemd-udevd happens to run at the same time
> > as lo_open() from mount, then we won't see EBUSY.
> 
> My waiting in lo_open() is to fix a regression that
> 
>   while :; do mount -o loop,ro isofs.iso isofs/; umount isofs/; done
> 
> started to fail.

OK, so you claim this loop fails even with using task work in __loop_clr_fd(),
correct? And does it fail even without systemd-udev?

> >                                                   But IMO that is not worth
> > the complexity in lo_open() because if systemd-udevd happens to close the
> > loop device a millisecond later, you will get EBUSY anyway (and you would
> > get it even in the past) Or am I missing something?
> 
> Excuse me, but lo_open() does not return EBUSY?
> What operation are you talking about?

I didn't mean EBUSY from lo_open() but EBUSY from LOOP_SET_FD ioctl(2). But
maybe I misunderstood the failure. Where exactly does mount(1) fail? Because
with the options you mention it should do something like:
  ioctl(LOOP_CTL_GET_FREE) -> get free loop dev
  ioctl(LOOP_SET_FD) -> bind isofs.iso to free loop dev
  mount(loop_dev, isofs/)

and I though it is the second syscall that fails.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR