On Thu, Dec 07, 2023 at 02:43:08PM +0800, Lu Baolu wrote: > @@ -217,12 +250,9 @@ int iommu_page_response(struct device *dev, > if (!ops->page_response) > return -ENODEV; > > - mutex_lock(¶m->lock); > - fault_param = param->fault_param; > - if (!fault_param) { > - mutex_unlock(¶m->lock); > + fault_param = iopf_get_dev_fault_param(dev); > + if (!fault_param) > return -EINVAL; > - } The refcounting should work by passing around the fault_param object, not re-obtaining it from the dev from a work. The work should be locked to the iommu_fault_param that was active when the work was launched. When we get to iommu_page_response it does this: /* Only send response if there is a fault report pending */ mutex_lock(&fault_param->lock); if (list_empty(&fault_param->faults)) { dev_warn_ratelimited(dev, "no pending PRQ, drop response\n"); goto done_unlock; } Which determines that the iommu_fault_param is stale and pending free.. Also iopf_queue_remove_device() is messed up - it returns an error code but nothing ever does anything with it :( Remove functions like this should never fail. Removal should be like I explained earlier: - Disable new PRI reception - Ack all outstanding PRQ to the device - Disable PRI on the device - Tear down the iopf infrastructure So under this model if the iopf_queue_remove_device() has been called it should be sort of a 'disassociate' action where fault_param is still floating out there but iommu_page_response() does nothing. IOW pass the refcount from the iommu_report_device_fault() down into the fault handler, into the work and then into iommu_page_response() which will ultimately put it back. > @@ -282,22 +313,15 @@ EXPORT_SYMBOL_GPL(iommu_page_response); > */ > int iopf_queue_flush_dev(struct device *dev) > { > - int ret = 0; > - struct iommu_fault_param *iopf_param; > - struct dev_iommu *param = dev->iommu; > + struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev); > > - if (!param) > + if (!iopf_param) > return -ENODEV; And this also seems unnecessary, it is a bug to call this after iopf_queue_remove_device() right? Just rcu_derefernce(param->fault_param, true) and WARN_ON NULL. Jason