On Fri, 2023-08-18 at 13:06 -0300, Jason Gunthorpe wrote: > On Fri, Aug 18, 2023 at 05:56:20PM +0200, Joerg Roedel wrote: > > On Thu, Aug 17, 2023 at 03:33:16PM -0300, Jason Gunthorpe wrote: > > > Bascially.. Yikes! > > > > Hmm, that is a difficult situation. Even if the problem is a misuse > > of > > the APIs we can not just blindly break other drivers by our core > > changes. > > They are not broken, they just throw a lockdep warning and keep going > as before. This is what triggers: > > static inline void device_lock_assert(struct device *dev) > { > lockdep_assert_held(&dev->mutex); > } > > So non-debug builds won't even see anything. > > > We need to resolve this situation pretty soon, otherwise I need to > > remove the locking rework patches from the IOMMU tree until the > > callers are fixed. > > > > Is there a way to keep the broken drivers working for now? > > Historically we've tolerated lockdep warnings as a way to motivate > people who care to fix their stuff properly. eg the Intel VT-D had a > lockdep warning at kernel boot for many releases before it was fixed. > > The series doesn't make things any functionally worse for these > places > misusing the API, but it now does throw a warning in some cases. Hi Jason, Well, I'm trying to chase down an apparent deadlock in the mdev cases that is introduced with the commit [1] blamed by these fixes. Seems that when iommu_group_{add|remove}_device gets called out of vfio (for example, vfio-ap or -ccw), the device lock is already held so attempting to get it again isn't going to go well. As near as I see, mdev_device_create calls device_driver_attach, which acquires the lock and eventually gets us into the vfio code, and then vfio_device_set_group get us into the iommu side. Not sure how to best untangle that. I'm puzzled why lockdep isn't shouting over this for me, so I added a lockdep_assert_not_held() in those paths to get a proper callchain: [ 49.319508] ------------[ cut here ]------------ [ 49.319546] WARNING: CPU: 0 PID: 4472 at drivers/iommu/iommu.c:1216 iommu_group_add_device+0xfc/0x120 [ 49.319562] Modules linked in: vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb algif_hash af_alg lcs ctcm fsm kvm xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables bridge stp llc vfio_ccw zcrypt_cex4 eadm_sch mdev vfio_iommu_type1 vfio loop configfs zram zsmalloc ghash_s390 prng aes_s390 des_s390 libdes qeth_l2 sha512_s390 sha256_s390 sha1_s390 sha_common pkey zcrypt rng_core dm_multipath dm_mod autofs4 [ 49.319721] CPU: 0 PID: 4472 Comm: python Kdump: loaded Not tainted 6.5.0-rc6-next-20230818-dirty #2 [ 49.319728] Hardware name: IBM 2964 NE1 749 (LPAR) [ 49.319734] Krnl PSW : 0704f00180000000 0000000190aab690 (iommu_group_add_device+0x100/0x120) [ 49.319748] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 [ 49.319758] Krnl GPRS: 0000000000000000 0000000100000000 0000000000000001 00000000ae8924f0 [ 49.319765] 00000001bc05a920 000000008e7f8000 0000000000000000 0000000000000001 [ 49.319772] 00000000ae892400 00000000ae892400 000000008d23b200 0000000000000000 [ 49.319779] 000003ff901d8d18 000003ff90616a08 0000000190aab686 0000038005c8ba50 [ 49.319792] Krnl Code: 0000000190aab680: c0e5001e5624 brasl %r14,0000000190e762c8 0000000190aab686: ec26ff9f017e cij %r2,1,6,0000000190aab5c4 #0000000190aab68c: af000000 mc 0,0 >0000000190aab690: a7f4ff9a brc 15,0000000190aab5c4 0000000190aab694: b9040028 lgr %r2,%r8 0000000190aab698: c0e5001eb210 brasl %r14,0000000190e81ab8 0000000190aab69e: 182b lr %r2,%r11 0000000190aab6a0: eb6ff0a00004 lmg %r6,%r15,160(%r15) [ 49.319872] Call Trace: [ 49.319878] [<0000000190aab690>] iommu_group_add_device+0x100/0x120 [ 49.319886] ([<0000000190aab686>] iommu_group_add_device+0xf6/0x120) [ 49.319894] [<000003ff8002f404>] vfio_device_set_group+0x9c/0x128 [vfio] [ 49.319909] [<000003ff8002cf12>] vfio_register_emulated_iommu_dev+0x6a/0xd0 [vfio] [ 49.319918] [<000003ff80083906>] vfio_ccw_mdev_probe+0xce/0x108 [vfio_ccw] [ 49.319931] [<0000000190abdbc8>] really_probe+0xd0/0x338 [ 49.319939] [<0000000190abe3cc>] device_driver_attach+0x5c/0xc8 [ 49.319947] [<000003ff8003ca30>] mdev_device_create+0x200/0x278 [mdev] [ 49.319956] [<000003ff8003ce1c>] create_store+0x8c/0xb8 [mdev] [ 49.319964] [<000000019059fba2>] kernfs_fop_write_iter+0x142/0x1d8 [ 49.319975] [<00000001904c7c76>] vfs_write+0x1de/0x440 [ 49.319986] [<00000001904c814e>] ksys_write+0x6e/0x100 [ 49.320024] [<0000000190e75bfe>] __do_syscall+0x1de/0x208 [ 49.320033] [<0000000190e8c470>] system_call+0x70/0x98 [ 49.320042] 5 locks held by python/4472: [ 49.320048] #0: 0000000086d86440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x6e/0x100 [ 49.320074] #1: 000000008d239c90 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x102/0x1d8 [ 49.320095] #2: 000000008ad95428 (kn->active#266){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x10e/0x1d8 [ 49.320119] #3: 00000000ae888b78 (&parent->unreg_sem){.+.+}-{3:3}, at: mdev_device_create+0x1b0/0x278 [mdev] [ 49.320141] #4: 00000000ae8924f0 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x4e/0xc8 [ 49.320162] Last Breaking-Event-Address: [ 49.320168] [<0000000190f7734c>] __s390_indirect_jump_r14+0x0/0xc [ 49.320179] irq event stamp: 147555 [ 49.320185] hardirqs last enabled at (147563): [<00000001901edcb4>] console_unlock+0x10c/0x118 [ 49.320194] hardirqs last disabled at (147570): [<00000001901edc96>] console_unlock+0xee/0x118 [ 49.320201] softirqs last enabled at (147304): [<0000000190e8e674>] __do_softirq+0x47c/0x510 [ 49.320209] softirqs last disabled at (147295): [<000000019015732a>] __irq_exit_rcu+0x11a/0x130 [ 49.320218] ---[ end trace 0000000000000000 ]--- Thanks, Eric [1] a16b5d1681ab ("iommu: Complete the locking for dev->iommu_group") > > IMHO I'd rather keep the warning rather than supress it by adding > device_locks(). Do you agree? > > Jason