Re: [PATCH v3 01/15] iommu/vt-d: Handle race between registration and device probe

Robin Murphy <robin.murphy@xxxxxxx> · Fri, 15 Jul 2022 13:37:18 +0100

On 2022-07-08 08:52, Baolu Lu wrote:
On 2022/7/6 01:08, Robin Murphy wrote:
That also highlights an issue with intel_iommu_get_resv_regions() taking
dmar_global_lock from within a section where intel_iommu_init() already
holds it, which already exists via probe_acpi_namespace_devices() when
an ANDD device is probed, but gets more obvious with the upcoming change
to iommu_device_register(). Since they are both read locks it manages
not to deadlock in practice, so I'm leaving it here for someone with
more confidence to tackle a larger rework of the locking.

I am trying to reproduce this problem. Strangely, even if I selected
CONFIG_LOCKDEP=y, the kernel didn't complain anything. :-)

FWIW, see below for the full report I get with this series applied (my 
machine doesn't have any ANDD entries to trigger the existing case).

In fact the rmrr list in the Intel IOMMU driver is always static after
parsing the ACPI/DMAR tables. There's no need to protect it with a lock.
Hence we can safely remove below down/up_read().

IIRC that leads to RCU warnings via for_each_dev_scope(), though. I did 
try replacing this down_read() with rcu_read_lock(), but then it doesn't 
like the GFP_KERNEL allocation in iommu_alloc_resv_region(), and that's 
where I gave up :)

I'm mostly left wondering whether the dmar_drhd_units list really needs 
to be RCU protected at all, as that seems to be the root of most of the 
problems here.

Cheers,
Robin.

4512 static void intel_iommu_get_resv_regions(struct device *device,
4513                                          struct list_head *head)
4514 {
4515         int prot = DMA_PTE_READ | DMA_PTE_WRITE;
4516         struct iommu_resv_region *reg;
4517         struct dmar_rmrr_unit *rmrr;
4518         struct device *i_dev;
4519         int i;
4520
4521         down_read(&dmar_global_lock);
4522         for_each_rmrr_units(rmrr) {
4523                 for_each_active_dev_scope(rmrr->devices, 
rmrr->devices_cnt,
4524                                           i, i_dev) {

Best regards,
baolu

---->8-----

[   11.421712] pci 0000:00:1b.0: Adding to iommu group 0
[   11.421977]
[   11.421978] ============================================
[   11.421979] WARNING: possible recursive locking detected
[   11.421981] 5.19.0-rc3-00015-gdc44a2269276 #32 Not tainted
[   11.421984] --------------------------------------------
[   11.421985] swapper/0/1 is trying to acquire lock:
[   11.421986] ffffffffb987b770 (dmar_global_lock){++++}-{3:3}, at: 
intel_iommu_get_resv_regions+0x28/0x3a0
[   11.422000]
[   11.422000] but task is already holding lock:
[   11.422001] ffffffffb987b770 (dmar_global_lock){++++}-{3:3}, at: 
intel_iommu_init+0x1638/0x1a08
[   11.422011]
[   11.422011] other info that might help us debug this:
[   11.422013]  Possible unsafe locking scenario:
[   11.422013]
[   11.422013]        CPU0
[   11.422014]        ----
[   11.422015]   lock(dmar_global_lock);
[   11.422018]   lock(dmar_global_lock);
[   11.422020]
[   11.422020]  *** DEADLOCK ***
[   11.422020]
[   11.422021]  May be due to missing lock nesting notation
[   11.422021]
[   11.422022] 2 locks held by swapper/0/1:
[   11.422024]  #0: ffffffffb987b770 (dmar_global_lock){++++}-{3:3}, at: 
intel_iommu_init+0x1638/0x1a08
[   11.422033]  #1: ffff8881077a10c0 (&group->mutex){+.+.}-{3:3}, at: 
bus_iommu_probe+0x139/0x4c0
[   11.422044]
[   11.422044] stack backtrace:
[   11.422046] CPU: 8 PID: 1 Comm: swapper/0 Not tainted 
5.19.0-rc3-00015-gdc44a2269276 #32
[   11.422050] Hardware name: LENOVO 30B6S08J03/1030, BIOS S01KT29A 
06/20/2016
[   11.422052] Call Trace:
[   11.422054]  <TASK>
[   11.422056]  dump_stack_lvl+0x45/0x59
[   11.422064]  __lock_acquire.cold+0x131/0x305
[   11.422075]  ? lockdep_hardirqs_on_prepare+0x220/0x220
[   11.422082]  ? lock_is_held_type+0xd7/0x130
[   11.422090]  ? rcu_read_lock_sched_held+0x9c/0xd0
[   11.422098]  lock_acquire+0x165/0x410
[   11.422102]  ? intel_iommu_get_resv_regions+0x28/0x3a0
[   11.422108]  ? lock_release+0x410/0x410
[   11.422112]  ? __iommu_domain_alloc+0xc5/0x130
[   11.422116]  ? iommu_group_alloc_default_domain+0x16/0x90
[   11.422121]  ? bus_iommu_probe+0x26d/0x4c0
[   11.422126]  ? iommu_device_register+0x11e/0x160
[   11.422130]  ? intel_iommu_init+0x16e0/0x1a08
[   11.422135]  ? do_one_initcall+0xb6/0x3c0
[   11.422140]  ? lock_is_held_type+0xd7/0x130
[   11.422147]  down_read+0x97/0x2f0
[   11.422152]  ? intel_iommu_get_resv_regions+0x28/0x3a0
[   11.422156]  ? rcu_read_lock_bh_held+0xb0/0xb0
[   11.422161]  ? rwsem_down_read_slowpath+0xa10/0xa10
[   11.422166]  ? find_held_lock+0x85/0xa0
[   11.422173]  intel_iommu_get_resv_regions+0x28/0x3a0
[   11.422178]  ? rcu_read_lock_sched_held+0x9c/0xd0
[   11.422185]  iommu_create_device_direct_mappings.isra.0+0x11a/0x330
[   11.422193]  ? iommu_map+0x50/0x50
[   11.422198]  ? __iommu_domain_alloc+0xc5/0x130
[   11.422205]  bus_iommu_probe+0x2bc/0x4c0
[   11.422210]  ? iommu_device_register+0xba/0x160
[   11.422216]  ? iommu_group_default_domain+0x20/0x20
[   11.422221]  ? do_raw_spin_lock+0x114/0x1d0
[   11.422226]  ? rwlock_bug.part.0+0x50/0x50
[   11.422231]  ? rwsem_down_read_slowpath+0xa10/0xa10
[   11.422239]  iommu_device_register+0x11e/0x160
[   11.422244]  intel_iommu_init+0x16e0/0x1a08
[   11.422249]  ? rcu_read_lock_bh_held+0xb0/0xb0
[   11.422254]  ? _raw_spin_unlock_irqrestore+0x28/0x50
[   11.422261]  ? lock_release+0x240/0x410
[   11.422265]  ? populate_rootfs+0x26/0x37
[   11.422272]  ? lock_downgrade+0x3a0/0x3a0
[   11.422277]  ? dmar_parse_one_rmrr+0x203/0x203
[   11.422281]  ? lock_is_held_type+0xd7/0x130
[   11.422286]  ? iommu_setup+0x282/0x282
[   11.422291]  ? rcu_read_lock_sched_held+0x9c/0xd0
[   11.422296]  ? rcu_read_lock_bh_held+0xb0/0xb0
[   11.422301]  ? up_write+0xd3/0x260
[   11.422305]  ? iommu_setup+0x282/0x282
[   11.422309]  pci_iommu_init+0x15/0x39
[   11.422313]  do_one_initcall+0xb6/0x3c0
[   11.422317]  ? initcall_blacklisted+0x120/0x120
[   11.422322]  ? rcu_read_lock_sched_held+0x9c/0xd0
[   11.422327]  ? rcu_read_lock_bh_held+0xb0/0xb0
[   11.422331]  ? kasan_unpoison+0x23/0x50
[   11.422337]  ? __kasan_slab_alloc+0x2c/0x80
[   11.422344]  kernel_init_freeable+0x330/0x389
[   11.422349]  ? rest_init+0x1b0/0x1b0
[   11.422354]  kernel_init+0x14/0x130
[   11.422359]  ret_from_fork+0x22/0x30
[   11.422367]  </TASK>