On 31/05/17 16:27, Nate Watterson wrote: > Hi Jean-Philippe, > > On 5/24/2017 2:01 PM, Jean-Philippe Brucker wrote: >> PCIe devices can implement their own TLB, named Address Translation Cache >> (ATC). In order to support Address Translation Service (ATS), the >> following changes are needed in software: >> >> * Enable ATS on endpoints when the system supports it. Both PCI root >> complex and associated SMMU must implement the ATS protocol. >> >> * When unmapping an IOVA, send an ATC invalidate request to the endpoint >> in addition to the usual SMMU IOTLB invalidations. >> >> I previously sent this as part of a lengthy RFC [1] adding SVM (ATS + >> PASID + PRI) support to SMMUv3. The next PASID/PRI version is almost >> ready, but isn't likely to get merged because it needs hardware testing, >> so I will send it later. PRI depends on ATS, but ATS should be useful on >> its own. >> >> Without PASID and PRI, ATS is used for accelerating transactions. Instead >> of having all memory accesses go through SMMU translation, the endpoint >> can translate IOVA->PA once, store the result in its ATC, then issue >> subsequent transactions using the PA, partially bypassing the SMMU. So in >> theory it should be faster while keeping the advantages of an IOMMU, >> namely scatter-gather and access control. >> >> The ATS patches can now be tested on some hardware, even though the lack >> of compatible PCI endpoints makes it difficult to assess what performance >> optimizations we need. That's why the ATS implementation is a bit rough at >> the moment, and we will work on optimizing things like invalidation ranges >> later. > > Sinan and I have tested this series on a QDF2400 development platform > using a PCIe exerciser card as the ATS capable endpoint. We were able > to verify that ATS requests complete with a valid translated address > and that DMA transactions using the pre-translated address "bypass" > the SMMU. Testing ATC invalidations was a bit more difficult as we > could not figure out how to get the exerciser card to automatically > send the completion message. We ended up having to write a debugger > script that would monitor the CMDQ and tell the exerciser to send > the completion when a hanging CMD_SYNC following a CMD_ATC_INV was > detected. Hopefully we'll get some real ATS capable endpoints to > test with soon. That's still a big step forward from my software tests, thanks a lot for the report. If you get around testing a real endpoint, there are a few data points that would be really useful to compare, if only to see whether enabling ATS is at all viable, or if we end up getting stuck in queue_poll_cons in normal conditions: * ATS enabled/disabled in endpoint * ATSCHK enabled/disabled in SMMU * Invalidation duration when ATC entry is present/absent, and the range is big/small Knowing this would indicate if more work is needed on invalidation sizing, batching, postponing or if we can optimize later. Thanks, Jean