Re: [PATCH v2 0/4] Fix device_lock deadlock on two probe() paths

Eric Farman <farman@xxxxxxxxxxxxx> · Fri, 18 Aug 2023 14:00:21 -0400

On Fri, 2023-08-18 at 13:06 -0300, Jason Gunthorpe wrote:
> On Fri, Aug 18, 2023 at 05:56:20PM +0200, Joerg Roedel wrote:
> > On Thu, Aug 17, 2023 at 03:33:16PM -0300, Jason Gunthorpe wrote:
> > > Bascially.. Yikes!
> > 
> > Hmm, that is a difficult situation. Even if the problem is a misuse
> > of
> > the APIs we can not just blindly break other drivers by our core
> > changes.
> 
> They are not broken, they just throw a lockdep warning and keep going
> as before. This is what triggers:
> 
> static inline void device_lock_assert(struct device *dev)
> {
>         lockdep_assert_held(&dev->mutex);
> }
> 
> So non-debug builds won't even see anything.
> 
> > We need to resolve this situation pretty soon, otherwise I need to
> > remove the locking rework patches from the IOMMU tree until the
> > callers are fixed.
> > 
> > Is there a way to keep the broken drivers working for now?
> 
> Historically we've tolerated lockdep warnings as a way to motivate
> people who care to fix their stuff properly. eg the Intel VT-D had a
> lockdep warning at kernel boot for many releases before it was fixed.
> 
> The series doesn't make things any functionally worse for these
> places
> misusing the API, but it now does throw a warning in some cases.

Hi Jason,

Well, I'm trying to chase down an apparent deadlock in the mdev cases
that is introduced with the commit [1] blamed by these fixes. Seems
that when iommu_group_{add|remove}_device gets called out of vfio (for
example, vfio-ap or -ccw), the device lock is already held so
attempting to get it again isn't going to go well.

As near as I see, mdev_device_create calls device_driver_attach, which
acquires the lock and eventually gets us into the vfio code, and then
vfio_device_set_group get us into the iommu side. Not sure how to best
untangle that.

I'm puzzled why lockdep isn't shouting over this for me, so I added a
lockdep_assert_not_held() in those paths to get a proper callchain:

[   49.319508] ------------[ cut here ]------------
[   49.319546] WARNING: CPU: 0 PID: 4472 at drivers/iommu/iommu.c:1216
iommu_group_add_device+0xfc/0x120
[   49.319562] Modules linked in: vhost_vsock
vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb algif_hash
af_alg lcs ctcm fsm kvm xt_MASQUERADE xt_conntrack ipt_REJECT
nf_reject_ipv4 xt_tcpudp iptable_mangle iptable_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables bridge
stp llc vfio_ccw zcrypt_cex4 eadm_sch mdev vfio_iommu_type1 vfio loop
configfs zram zsmalloc ghash_s390 prng aes_s390 des_s390 libdes qeth_l2
sha512_s390 sha256_s390 sha1_s390 sha_common pkey zcrypt rng_core
dm_multipath dm_mod autofs4
[   49.319721] CPU: 0 PID: 4472 Comm: python Kdump: loaded Not tainted
6.5.0-rc6-next-20230818-dirty #2
[   49.319728] Hardware name: IBM 2964 NE1 749 (LPAR)
[   49.319734] Krnl PSW : 0704f00180000000 0000000190aab690
(iommu_group_add_device+0x100/0x120)
[   49.319748]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3
PM:0 RI:0 EA:3
[   49.319758] Krnl GPRS: 0000000000000000 0000000100000000
0000000000000001 00000000ae8924f0
[   49.319765]            00000001bc05a920 000000008e7f8000
0000000000000000 0000000000000001
[   49.319772]            00000000ae892400 00000000ae892400
000000008d23b200 0000000000000000
[   49.319779]            000003ff901d8d18 000003ff90616a08
0000000190aab686 0000038005c8ba50
[   49.319792] Krnl Code: 0000000190aab680:
c0e5001e5624	brasl	%r14,0000000190e762c8
                          0000000190aab686:
ec26ff9f017e	cij	%r2,1,6,0000000190aab5c4
                         #0000000190aab68c:
af000000		mc	0,0
                         >0000000190aab690:
a7f4ff9a		brc	15,0000000190aab5c4
                          0000000190aab694:
b9040028		lgr	%r2,%r8
                          0000000190aab698:
c0e5001eb210	brasl	%r14,0000000190e81ab8
                          0000000190aab69e:
182b		lr	%r2,%r11
                          0000000190aab6a0:
eb6ff0a00004	lmg	%r6,%r15,160(%r15)
[   49.319872] Call Trace:
[   49.319878]  [<0000000190aab690>] iommu_group_add_device+0x100/0x120
[   49.319886] ([<0000000190aab686>] iommu_group_add_device+0xf6/0x120)
[   49.319894]  [<000003ff8002f404>] vfio_device_set_group+0x9c/0x128
[vfio] 
[   49.319909]  [<000003ff8002cf12>]
vfio_register_emulated_iommu_dev+0x6a/0xd0 [vfio] 
[   49.319918]  [<000003ff80083906>] vfio_ccw_mdev_probe+0xce/0x108
[vfio_ccw] 
[   49.319931]  [<0000000190abdbc8>] really_probe+0xd0/0x338 
[   49.319939]  [<0000000190abe3cc>] device_driver_attach+0x5c/0xc8 
[   49.319947]  [<000003ff8003ca30>] mdev_device_create+0x200/0x278
[mdev] 
[   49.319956]  [<000003ff8003ce1c>] create_store+0x8c/0xb8 [mdev] 
[   49.319964]  [<000000019059fba2>] kernfs_fop_write_iter+0x142/0x1d8 
[   49.319975]  [<00000001904c7c76>] vfs_write+0x1de/0x440 
[   49.319986]  [<00000001904c814e>] ksys_write+0x6e/0x100 
[   49.320024]  [<0000000190e75bfe>] __do_syscall+0x1de/0x208 
[   49.320033]  [<0000000190e8c470>] system_call+0x70/0x98 
[   49.320042] 5 locks held by python/4472:
[   49.320048]  #0: 0000000086d86440 (sb_writers#3){.+.+}-{0:0}, at:
ksys_write+0x6e/0x100
[   49.320074]  #1: 000000008d239c90 (&of->mutex){+.+.}-{3:3}, at:
kernfs_fop_write_iter+0x102/0x1d8
[   49.320095]  #2: 000000008ad95428 (kn->active#266){.+.+}-{0:0}, at:
kernfs_fop_write_iter+0x10e/0x1d8
[   49.320119]  #3: 00000000ae888b78 (&parent->unreg_sem){.+.+}-{3:3},
at: mdev_device_create+0x1b0/0x278 [mdev]
[   49.320141]  #4: 00000000ae8924f0 (&dev->mutex){....}-{3:3}, at:
device_driver_attach+0x4e/0xc8
[   49.320162] Last Breaking-Event-Address:
[   49.320168]  [<0000000190f7734c>] __s390_indirect_jump_r14+0x0/0xc
[   49.320179] irq event stamp: 147555
[   49.320185] hardirqs last  enabled at (147563): [<00000001901edcb4>]
console_unlock+0x10c/0x118
[   49.320194] hardirqs last disabled at (147570): [<00000001901edc96>]
console_unlock+0xee/0x118
[   49.320201] softirqs last  enabled at (147304): [<0000000190e8e674>]
__do_softirq+0x47c/0x510
[   49.320209] softirqs last disabled at (147295): [<000000019015732a>]
__irq_exit_rcu+0x11a/0x130
[   49.320218] ---[ end trace 0000000000000000 ]---

Thanks,
Eric

[1] a16b5d1681ab ("iommu: Complete the locking for dev->iommu_group")

> 
> IMHO I'd rather keep the warning rather than supress it by adding
> device_locks(). Do you agree?
> 
> Jason