RE: Regression on drm-tip

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx>
> Sent: Sunday, March 16, 2025 8:04 AM
> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@xxxxxxxxx>
> Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx; intel-xe@xxxxxxxxxxxxxxxxxxxxx;
> iommu@xxxxxxxxxxxxxxx
> Subject: Re: Regression on drm-tip
> 
> On 3/14/25 17:04, Borah, Chaitanya Kumar wrote:
> >
> >
> >> -----Original Message-----
> >> From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx>
> >> Sent: Thursday, March 13, 2025 7:53 PM
> >> To: Borah, Chaitanya Kumar <chaitanya.kumar.borah@xxxxxxxxx>
> >> Cc: baolu.lu@xxxxxxxxxxxxxxx; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; intel-
> >> xe@xxxxxxxxxxxxxxxxxxxxx; iommu@xxxxxxxxxxxxxxx
> >> Subject: Re: Regression on drm-tip
> >>
> >> On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
> >>> Hello Lu,
> >>>
> >>> Hope you are doing well. I am Chaitanya from the linux graphics team
> >>> in
> >> Intel.
> >>>
> >>> This mail is regarding a regression we are seeing in our CI runs[1]
> >>> on drm-tip
> >> repository.
> >>>
> >>> ````````````````````````````````````````````````````````````````````
> >>> `` ``````````` <4>[    2.856622] WARNING: possible circular locking
> >>> dependency detected <4>[    2.856631]
> >>> 6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G          I <4>[
> >>> 2.856642] ------------------------------------------------------
> >>> <4>[    2.856650] swapper/0/1 is trying to acquire lock:
> >>> <4>[    2.856657] ffffffff8360ecc8
> >>> (iommu_probe_device_lock){+.+.}-{3:3}, at:
> >>> iommu_probe_device+0x1d/0x70 <4>[    2.856679]
> >>>                     but task is already holding lock:
> >>> <4>[    2.856686] ffff888102ab6fa8
> >>> (&device->physical_node_lock){+.+.}-{3:3}, at:
> >>> intel_iommu_init+0xea1/0x1220
> >>> ````````````````````````````````````````````````````````````````````
> >>> ``
> >>> ```````````
> >>> Details log can be found in [2].
> >>>
> >>> After bisecting the tree, the following patch [3] seems to be the
> >>> first "bad" commit
> >>>
> >>> ````````````````````````````````````````````````````````````````````
> >>> ``
> >>> ```````````````````````````````````
> >>> commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
> >>> Author: Lu Baolumailto:baolu.lu@xxxxxxxxxxxxxxx
> >>> Date:   Fri Feb 28 18:27:26 2025 +0800
> >>>
> >>>       iommu/vt-d: Fix suspicious RCU usage
> >>>
> >>> ````````````````````````````````````````````````````````````````````
> >>> ``
> >>> ```````````````````````````````````
> >>>
> >>> We also verified that if we revert the patch the issue is not seen.
> >>>
> >>> Could you please check why the patch causes this regression and
> >>> provide a
> >> fix if necessary?
> >>
> >> Can you please take a quick test to check if the following fix works?
> >>
> >> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> >> index
> >> e540092d664d..06debeaec643 100644
> >> --- a/drivers/iommu/intel/dmar.c
> >> +++ b/drivers/iommu/intel/dmar.c
> >> @@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int
> cpu)
> >>                   if (iommu->irq || iommu->node != cpu_to_node(cpu))
> >>                           continue;
> >>
> >> +               /*
> >> +                * Call dmar_alloc_hwirq() with dmar_global_lock held,
> >> +                * could cause possible lock race condition.
> >> +                */
> >> +               up_read(&dmar_global_lock);
> >>                   ret = dmar_set_interrupt(iommu);
> >> -
> >> +               down_read(&dmar_global_lock);
> >>                   if (ret) {
> >>                           pr_err("DRHD %Lx: failed to enable fault, interrupt, ret
> %d\n",
> >>                                  (unsigned long
> >> long)drhd->reg_base_addr, ret);
> >>
> >> Thanks,
> >> baolu
> >
> > We still see the issue with this change.
> 
> I am attempting to reproduce this issue with my MTL machine. I pulled the
> test branch from:
> 
> https://anongit.freedesktop.org/git/drm-tip.git
> 
> and built the test kernel image using the configuration file from:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_16217/kconfig.txt
> 
> But I did not observe the lockdep splat mentioned above after booting.
> 
> Is there anything I might have missed?
> 

+Suresh, Jani, Lucas

We are seeing this only the skykale and kabylake on our CI runs.

https://intel-gfx-ci.01.org/tree/drm-tip/igt@runner@xxxxxxxxxxxx

Regards

Chaitanya

> Thanks,
> baolu




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux