On 1/17/2024 1:38 PM, Ethan Zhao wrote:
On 1/17/2024 1:26 PM, Ethan Zhao wrote:
On 1/17/2024 11:24 AM, Baolu Lu wrote:
On 2024/1/15 15:58, Ethan Zhao wrote:
-static int qi_check_fault(struct intel_iommu *iommu, int index,
int wait_index)
+static int qi_check_fault(struct intel_iommu *iommu, int index,
int wait_index,
+ pci_dev *target_pdev)
{
u32 fault;
int head, tail;
+ u64 iqe_err, ice_sid;
struct q_inval *qi = iommu->qi;
int shift = qi_shift(iommu);
if (qi->desc_status[wait_index] == QI_ABORT)
return -EAGAIN;
+ /*
+ * If the ATS invalidation target device is gone this
moment (surprise
+ * removed, died, no response) don't try this request
again. this
+ * request will not get valid result anymore. but the
request was
+ * already submitted to hardware and we predict to get a
ITE in
+ * followed batch of request, if so, it will get handled then.
+ */
We can't leave the ITE triggered by this request for the next one,
which
has no context about why this happened. Perhaps move below code down to
the segment that handles ITEs.
Here, the invalidation request has been issued to hardware but target
device
gone, we can't loop and wait for the ITE for this request to happen,
and we
bail out here because we hold lock_irqsave lock , the ITE still could
happen
with later batch request in the future, though it is not triggered
by that request,
but it could still be cleaned/handled. move it to the fault() segment
?,there means
That moment, the ITE was triggered by previous requests, they are not
in current
context, also shouldn't be retried, they have response time over
expected.
but triggered ITE fault blocks this patch request, we should retry
this batch request.
we just clean the fault and retry it. nothing missed.
Thanks,
Ethan
ITE already happened, no need to check target presence anymore.
did I miss something about the context lost ?
Another concern is about qi_dump_fault(), which pr_err's the fault
message as long as the register is set. Some faults are predictable,
such as cache invalidation for surprise-removed devices.
Unconditionally
reporting errors with pr_err() may lead the user to believe that a more
serious hardware error has occurred. Probably we can refine this
part of
the code as well.
Agree, may refine them in seperated series ?
loop and always retry IQE, ICE don't make sense per my
understanding. if
IQE happened retry it will always reproduce the fault, because
request is the same.
we could fix them together in other patches.
Thanks,
Ethan
Others look sane to me.
+ if (target_pdev && !pci_device_is_present(target_pdev))
+ return -EINVAL;
+
fault = readl(iommu->reg + DMAR_FSTS_REG);
if (fault & (DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE))
qi_dump_fault(iommu, fault);
@@ -1315,6 +1327,13 @@ static int qi_check_fault(struct intel_iommu
*iommu, int index, int wait_index)
tail = readl(iommu->reg + DMAR_IQT_REG);
tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
+ /*
+ * SID field is valid only when the ITE field is
Set in FSTS_REG
+ * see Intel VT-d spec r4.1, section 11.4.9.9
+ */
+ iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
+ ice_sid = DMAR_IQER_REG_ITESID(iqe_err);
+
writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
pr_info("Invalidation Time-out Error (ITE)
cleared\n");
@@ -1324,6 +1343,16 @@ static int qi_check_fault(struct intel_iommu
*iommu, int index, int wait_index)
head = (head - 2 + QI_LENGTH) % QI_LENGTH;
} while (head != tail);
+ /*
+ * If got ITE, we need to check if the sid of ITE
is the same as
+ * current ATS invalidation target device, if yes,
don't try this
+ * request anymore, the target device has a
response time beyound
+ * expected. 0 value of ice_sid means old device,
no ice_sid value.
+ */
+ if (target_pdev && ice_sid && ice_sid ==
+ pci_dev_id(pci_physfn(target_pdev))
+ return -ETIMEDOUT;
+
if (qi->desc_status[wait_index] == QI_ABORT)
return -EAGAIN;
}
Best regards,
baolu
new proposed qi_check_fault() change as following:
- remove the lines of
if (qi->desc_status[wait_index] == QI_ABORT)
return -EAGAIN;
reason see the comment in the code.
- others, the same as last proposed code.
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1267,15 +1267,33 @@ static void qi_dump_fault(struct intel_iommu
*iommu, u32 fault)
(unsigned long long)desc->qw1);
}
-static int qi_check_fault(struct intel_iommu *iommu, int index, int
wait_index)
+static int qi_check_fault(struct intel_iommu *iommu, int index, int
wait_index,
+ pci_dev *target_pdev)
{
u32 fault;
int head, tail;
+ u64 iqe_err, ice_sid;
struct q_inval *qi = iommu->qi;
int shift = qi_shift(iommu);
+ /*
+ * If the inv_wait_dsc got QI_ABORT here, means an ITE happens,
retry
+ * this batch of invalidation request without clearing the ITE fault
+ * will be trapped into dead loop. remov this line.
+ * see Intel VT-d spec r4.1 sec 6.5.2.10, 6.5.2.8
+ *
+ * if (qi->desc_status[wait_index] == QI_ABORT)
+ * return -EAGAIN;
+ */
- if (qi->desc_status[wait_index] == QI_ABORT)
- return -EAGAIN;
+ /*
+ * If the ATS invalidation target device is gone this moment
(surprise
+ * removed, died, no response) don't try this request again. this
+ * request will not get valid result anymore. but the request was
+ * already submitted to hardware and we predict to get a ITE in
+ * followed batch of request, if so, it will get handled then.
+ */
+ if (target_pdev && !pci_device_is_present(target_pdev))
+ return -EINVAL;
fault = readl(iommu->reg + DMAR_FSTS_REG);
if (fault & (DMA_FSTS_IQE | DMA_FSTS_ITE | DMA_FSTS_ICE))
@@ -1315,6 +1333,13 @@ static int qi_check_fault(struct intel_iommu
*iommu, int index, int wait_index)
tail = readl(iommu->reg + DMAR_IQT_REG);
tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
+ /*
+ * SID field is valid only when the ITE field is Set in
FSTS_REG
+ * see Intel VT-d spec r4.1, section 11.4.9.9
+ */
+ iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
+ ice_sid = DMAR_IQER_REG_ITESID(iqe_err);
+
writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
pr_info("Invalidation Time-out Error (ITE) cleared\n");
@@ -1324,6 +1349,16 @@ static int qi_check_fault(struct intel_iommu
*iommu, int index, int wait_index)
head = (head - 2 + QI_LENGTH) % QI_LENGTH;
} while (head != tail);
+ /*
+ * If got ITE, we need to check if the sid of ITE is the
same as
+ * current ATS invalidation target device, if yes, don't
try this
+ * request anymore, the target device has a response
time beyound
+ * expected. 0 value of ice_sid means old device, no
ice_sid value.
+ */
+ if (target_pdev && ice_sid && ice_sid ==
+ pci_dev_id(pci_physfn(target_pdev))
+ return -ETIMEDOUT;
+
if (qi->desc_status[wait_index] == QI_ABORT)
return -EAGAIN;
}
@@ -1344,7 +1379,7 @@ static int qi_check_fault(struct intel_iommu
*iommu, int index, int wait_index)
* can be part of the submission but it will not be polled for completion.
*/
int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
- unsigned int count, unsigned long options)
+ unsigned int count, unsigned long options, pci_dev
*target_pdev)
{
struct q_inval *qi = iommu->qi;
s64 devtlb_start_ktime = 0;
@@ -1430,7 +1465,7 @@ int qi_submit_sync(struct intel_iommu *iommu,
struct qi_desc *desc,
* a deadlock where the interrupt context can wait
indefinitely
* for free slots in the queue.
*/
- rc = qi_check_fault(iommu, index, wait_index);
+ rc = qi_check_fault(iommu, index, wait_index, target_pdev);
if (rc)
break;
Thanks,
Ethan