On Wed, Feb 26, 2020 at 12:02:38PM -0500, Qian Cai wrote: > On Mon, 2020-02-24 at 17:17 -0500, Dan Schatzberg wrote: > > Existing uses of loop device may have multiple cgroups reading/writing > > to the same device. Simply charging resources for I/O to the backing > > file could result in priority inversion where one cgroup gets > > synchronously blocked, holding up all other I/O to the loop device. > > > > In order to avoid this priority inversion, we use a single workqueue > > where each work item is a "struct loop_worker" which contains a queue of > > struct loop_cmds to issue. The loop device maintains a tree mapping blk > > css_id -> loop_worker. This allows each cgroup to independently make > > forward progress issuing I/O to the backing file. > > > > There is also a single queue for I/O associated with the rootcg which > > can be used in cases of extreme memory shortage where we cannot allocate > > a loop_worker. > > > > The locking for the tree and queues is fairly heavy handed - we acquire > > the per-loop-device spinlock any time either is accessed. The existing > > implementation serializes all I/O through a single thread anyways, so I > > don't believe this is any worse. > > > > Signed-off-by: Dan Schatzberg <schatzberg.dan@xxxxxxxxx> > > Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx> > > The locking in loop_free_idle_workers() will trigger this with sysfs reading, > > [ 7080.047167] LTP: starting read_all_sys (read_all -d /sys -q -r 10) > [ 7239.842276] cpufreq transition table exceeds PAGE_SIZE. Disabling > > [ 7247.054961] ===================================================== > [ 7247.054971] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected > [ 7247.054983] 5.6.0-rc3-next-20200226 #2 Tainted: G O > [ 7247.054992] ----------------------------------------------------- > [ 7247.055002] read_all/8513 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: > [ 7247.055014] c0000006844864c8 (&fs->seq){+.+.}, at: file_path+0x24/0x40 > [ 7247.055041] > and this task is already holding: > [ 7247.055061] c0002006bab8b928 (&(&lo->lo_lock)->rlock){..-.}, at: > loop_attr_do_show_backing_file+0x3c/0x120 [loop] > [ 7247.055078] which would create a new lock dependency: > [ 7247.055105] (&(&lo->lo_lock)->rlock){..-.} -> (&fs->seq){+.+.} > [ 7247.055125] > but this new dependency connects a SOFTIRQ-irq-safe lock: > [ 7247.055155] (&(&lo->lo_lock)->rlock){..-.} > [ 7247.055156] > ... which became SOFTIRQ-irq-safe at: > [ 7247.055196] lock_acquire+0x130/0x360 > [ 7247.055221] _raw_spin_lock_irq+0x68/0x90 > [ 7247.055230] loop_free_idle_workers+0x44/0x3f0 [loop] > [ 7247.055242] call_timer_fn+0x110/0x5f0 > [ 7247.055260] run_timer_softirq+0x8f8/0x9f0 > [ 7247.055278] __do_softirq+0x34c/0x8c8 > [ 7247.055288] irq_exit+0x16c/0x1d0 > [ 7247.055298] timer_interrupt+0x1f0/0x680 > [ 7247.055308] decrementer_common+0x124/0x130 > [ 7247.055328] arch_local_irq_restore.part.8+0x34/0x90 > [ 7247.055352] cpuidle_enter_state+0x11c/0x8f0 > [ 7247.055361] cpuidle_enter+0x50/0x70 > [ 7247.055389] call_cpuidle+0x4c/0x90 > [ 7247.055398] do_idle+0x378/0x470 > [ 7247.055414] cpu_startup_entry+0x3c/0x40 > [ 7247.055442] start_secondary+0x7a8/0xa80 > [ 7247.055461] start_secondary_prolog+0x10/0x14 That's kind of hilarious. So even though it's a spin_lock_irq(), suggesting it's used from both process and irq context, Dan appears to be adding the first user that actually runs from irq context. It looks like it should have been a regular spinlock all along. Until now, anyway. Fixing it should be straight-forward. Use get_file() under lock to pin the file, drop the lock to do file_path(), release file with fput().