On Tue, Mar 02, 2021 at 03:59:57PM -0800, Jacob Pan wrote: > Hi Jean-Philippe, > > A few comments from the p.o.v of converting VT-d to this framework. Mostly > about potential optimization. I think VT-d SVA code will be able to use this > work. > +Ashok provided many insight. > > FWIW, > Reviewed-by:Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> Thanks! > On Tue, 2 Mar 2021 10:26:42 +0100, Jean-Philippe Brucker > <jean-philippe@xxxxxxxxxx> wrote: > > +int iommu_queue_iopf(struct iommu_fault *fault, void *cookie) > > +{ > > + int ret; > > + struct iopf_group *group; > > + struct iopf_fault *iopf, *next; > > + struct iopf_device_param *iopf_param; > > + > > + struct device *dev = cookie; > > + struct dev_iommu *param = dev->iommu; > > + > > + lockdep_assert_held(¶m->lock); > > + > > + if (fault->type != IOMMU_FAULT_PAGE_REQ) > > + /* Not a recoverable page fault */ > > + return -EOPNOTSUPP; > > + > > + /* > > + * As long as we're holding param->lock, the queue can't be > > unlinked > > + * from the device and therefore cannot disappear. > > + */ > > + iopf_param = param->iopf_param; > > + if (!iopf_param) > > + return -ENODEV; > > + > > + if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) { > > + iopf = kzalloc(sizeof(*iopf), GFP_KERNEL); > > + if (!iopf) > > + return -ENOMEM; > > + > > + iopf->fault = *fault; > > + > > + /* Non-last request of a group. Postpone until the last > > one */ > Would be nice to have an option here to allow non-deferred handle_mm_fault. > Some devices may have a large group. > > FYI, VT-d also needs to send page response before the last one (LPIG). > "A Page Group Response Descriptor is issued by software in response to a > page request with data or a page request (with or without data) that > indicated that it was the last request in a group." > > But I think we deal with that when we convert. Perhaps just treat the > request with data as LPIG. Sure that seems fine. Do you mean that the vt-d driver will set the IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE flag for PR-with-data? Otherwise we could introduce a new flag. I prefer making it explicit rather than having IOPF consumers infer that a response is needed when seeing IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA set, because private_data wouldn't be reusable by other architectures. > Also adding a trace event would be nice, similar to CPU page fault. Agreed, the tracepoints in my development tree (along with the lower-level page_request/response tracepoints) have been indispensable for debugging hardware and SVA applications > > + list_add(&iopf->list, &iopf_param->partial); > > + > > + return 0; > > + } > > + > > + group = kzalloc(sizeof(*group), GFP_KERNEL); > > + if (!group) { > > + /* > > + * The caller will send a response to the hardware. But > > we do > > + * need to clean up before leaving, otherwise partial > > faults > > + * will be stuck. > > + */ > > + ret = -ENOMEM; > > + goto cleanup_partial; > > + } > > + > > + group->dev = dev; > > + group->last_fault.fault = *fault; > > + INIT_LIST_HEAD(&group->faults); > > + list_add(&group->last_fault.list, &group->faults); > > + INIT_WORK(&group->work, iopf_handle_group); > > + > > + /* See if we have partial faults for this group */ > > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) > > { > > + if (iopf->fault.prm.grpid == fault->prm.grpid) > Just curious, the iopf handler is registered per arm_smmu_master dev. Is > the namespace of group ID also within an arm_smmu_master? Yes for both PCIe PRI and SMMU stall, the namespace is within one device (one RequesterID or StreamID) > Can one arm_smmu_master support multiple devices? No, it's a single device > > For VT-d, group ID is per PCI device. > > > + /* Insert *before* the last fault */ > > + list_move(&iopf->list, &group->faults); > > + } > > + > This is fine with devices supports small number of outstanding PRQs. > VT-d is setting the limit to 32. But the incoming DSA device will support > 512. > > So if we pre-sort IOPF by group ID and put them in a per group list, would > it be faster? I mean once the LPIG comes in, we just need to search the > list of groups instead of all partial faults. I am not against what is done > here, just exploring optimization. Yes I think that's worth exploring, keeping the groups in a rb_tree or something like that, for easy access and update. Note that I don't have a system with non-LPIG faults, so I can't test any of this at the moment Thanks, Jean