On 3/14/25 17:04, Borah, Chaitanya Kumar wrote:
-----Original Message-----
From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx>
Sent: Thursday, March 13, 2025 7:53 PM
To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@xxxxxxxxx>
Cc: baolu.lu@xxxxxxxxxxxxxxx; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; intel-
xe@xxxxxxxxxxxxxxxxxxxxx; iommu@xxxxxxxxxxxxxxx
Subject: Re: Regression on drm-tip
On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
Hello Lu,
Hope you are doing well. I am Chaitanya from the linux graphics team in
Intel.
This mail is regarding a regression we are seeing in our CI runs[1] on drm-tip
repository.
``````````````````````````````````````````````````````````````````````
``````````` <4>[ 2.856622] WARNING: possible circular locking
dependency detected <4>[ 2.856631]
6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G I <4>[
2.856642] ------------------------------------------------------
<4>[ 2.856650] swapper/0/1 is trying to acquire lock:
<4>[ 2.856657] ffffffff8360ecc8
(iommu_probe_device_lock){+.+.}-{3:3}, at:
iommu_probe_device+0x1d/0x70 <4>[ 2.856679]
but task is already holding lock:
<4>[ 2.856686] ffff888102ab6fa8
(&device->physical_node_lock){+.+.}-{3:3}, at:
intel_iommu_init+0xea1/0x1220
``````````````````````````````````````````````````````````````````````
```````````
Details log can be found in [2].
After bisecting the tree, the following patch [3] seems to be the
first "bad" commit
``````````````````````````````````````````````````````````````````````
```````````````````````````````````
commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
Author: Lu Baolumailto:baolu.lu@xxxxxxxxxxxxxxx
Date: Fri Feb 28 18:27:26 2025 +0800
iommu/vt-d: Fix suspicious RCU usage
``````````````````````````````````````````````````````````````````````
```````````````````````````````````
We also verified that if we revert the patch the issue is not seen.
Could you please check why the patch causes this regression and provide a
fix if necessary?
Can you please take a quick test to check if the following fix works?
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index
e540092d664d..06debeaec643 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int cpu)
if (iommu->irq || iommu->node != cpu_to_node(cpu))
continue;
+ /*
+ * Call dmar_alloc_hwirq() with dmar_global_lock held,
+ * could cause possible lock race condition.
+ */
+ up_read(&dmar_global_lock);
ret = dmar_set_interrupt(iommu);
-
+ down_read(&dmar_global_lock);
if (ret) {
pr_err("DRHD %Lx: failed to enable fault, interrupt, ret %d\n",
(unsigned long long)drhd->reg_base_addr, ret);
Thanks,
baolu
We still see the issue with this change.
I am attempting to reproduce this issue with my MTL machine. I pulled
the test branch from:
https://anongit.freedesktop.org/git/drm-tip.git
and built the test kernel image using the configuration file from:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_16217/kconfig.txt
But I did not observe the lockdep splat mentioned above after booting.
Is there anything I might have missed?
Thanks,
baolu