> From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx> > Sent: Monday, January 29, 2024 10:49 PM > > On 2024/1/29 17:06, Tian, Kevin wrote: > >> From: Ethan Zhao<haifeng.zhao@xxxxxxxxxxxxxxx> > >> Sent: Monday, January 29, 2024 11:49 AM > >> > >> Because surprise removal could happen anytime, e.g. user could request > safe > >> removal to EP(endpoint device) via sysfs and brings its link down to do > >> surprise removal cocurrently. such aggressive cases would cause ATS > >> invalidation request issued to non-existence target device, then deadly > >> loop to retry that request after ITE fault triggered in interrupt context. > >> this patch aims to optimize the ITE handling by checking the target device > >> presence state to avoid retrying the timeout request blindly, thus avoid > >> hard lockup or system hang. > >> > >> Signed-off-by: Ethan Zhao<haifeng.zhao@xxxxxxxxxxxxxxx> > >> --- > >> drivers/iommu/intel/dmar.c | 18 ++++++++++++++++++ > >> 1 file changed, 18 insertions(+) > >> > >> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c > >> index 814134e9aa5a..2e214b43725c 100644 > >> --- a/drivers/iommu/intel/dmar.c > >> +++ b/drivers/iommu/intel/dmar.c > >> @@ -1272,6 +1272,7 @@ static int qi_check_fault(struct intel_iommu > >> *iommu, int index, int wait_index, > >> { > >> u32 fault; > >> int head, tail; > >> + u64 iqe_err, ite_sid; > >> struct q_inval *qi = iommu->qi; > >> int shift = qi_shift(iommu); > >> > >> @@ -1316,6 +1317,13 @@ static int qi_check_fault(struct intel_iommu > >> *iommu, int index, int wait_index, > >> tail = readl(iommu->reg + DMAR_IQT_REG); > >> tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH; > >> > >> + /* > >> + * SID field is valid only when the ITE field is Set in FSTS_REG > >> + * see Intel VT-d spec r4.1, section 11.4.9.9 > >> + */ > >> + iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG); > >> + ite_sid = DMAR_IQER_REG_ITESID(iqe_err); > >> + > >> writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG); > >> pr_info("Invalidation Time-out Error (ITE) cleared\n"); > >> > >> @@ -1325,6 +1333,16 @@ static int qi_check_fault(struct intel_iommu > >> *iommu, int index, int wait_index, > >> head = (head - 2 + QI_LENGTH) % QI_LENGTH; > >> } while (head != tail); > >> > >> + /* > >> + * If got ITE, we need to check if the sid of ITE is the same as > >> + * current ATS invalidation target device, if yes, don't try this > >> + * request anymore if the target device isn't present. > >> + * 0 value of ite_sid means old VT-d device, no ite_sid value. > >> + */ > >> + if (pdev && ite_sid && !pci_device_is_present(pdev) && > >> + ite_sid == pci_dev_id(pci_physfn(pdev))) > >> + return -ETIMEDOUT; > >> + > > since the hardware already reports source id leading to timeout, can't we > > just find the pci_dev according to reported ite_sid? this is a slow path > (either > > due to device in bad state or removed) hence it's not necessary to add > more > > intelligence to pass the pci_dev in, leading to only a partial fix can be > backported. > > > > It's also more future-proof, say if one day the driver allows batching > invalidation > > requests for multiple devices then no need to pass in a list of devices. > > I have ever thought about this solution and gave up in the end due to > the locking issue. > > A batch of qi requests must be handled in the spin lock critical region > to enforce that only one batch of requests is submitted at a time. > Searching pci_dev in this locking region might result in nested locking > issues, and I haven't found a good solution for this yet. > in spin-lock you just get the sid. searching pci_dev can be done outside of the critical region. anyway the handling of -EAGAIN is already after spin_unlock.