On Fri, Nov 21 2014 at 03:48:33 AM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote: > On Mon, 17 Nov 2014, Stefano Stabellini wrote: >> Hi all, >> I am writing this email to ask for your advice. >> >> On architectures where dma addresses are different from physical >> addresses, it can be difficult to retrieve the physical address of a >> page from its dma address. >> >> Specifically this is the case for Xen on arm and arm64 but I think that >> other architectures might have the same issue. >> >> Knowing the physical address is necessary to be able to issue any >> required cache maintenance operations when unmap_page, >> sync_single_for_cpu and sync_single_for_device are called. >> >> Adding a struct page* parameter to unmap_page, sync_single_for_cpu and >> sync_single_for_device would make Linux dma handling on Xen on arm and >> arm64 much easier and quicker. >> >> I think that other drivers have similar problems, such as the Intel >> IOMMU driver having to call find_iova and walking down an rbtree to get >> the physical address in its implementation of unmap_page. >> >> Callers have the struct page* in their hands already from the previous >> map_page call so it shouldn't be an issue for them. A problem does >> exist however: there are about 280 callers of dma_unmap_page and >> pci_unmap_page. We have even more callers of the dma_sync_single_for_* >> functions. >> >> >> >> Is such a change even conceivable? How would one go about it? >> >> I think that Xen would not be the only one to gain from it, but I would >> like to have a confirmation from others: given the magnitude of the >> changes involved I would actually prefer to avoid them unless multiple >> drivers/archs/subsystems could really benefit from them. > > Given the lack of interest from the community, I am going to drop this > idea. Actually it sounds like the right API design to me. As a bonus it should help performance a bit as well. For example, the current implementations of dma_sync_single_for_{cpu,device} and dma_unmap_page on ARM while using the IOMMU mapper (arm_iommu_sync_single_for_{cpu,device}, arm_iommu_unmap_page) all call iommu_iova_to_phys which generally results in a page table walk or a hardware register write/poll/read. The problem, as you mentioned, is that there are a ton of callers of the existing APIs. I think David Vrabel had a good suggestion for dealing with this: On Mon, Nov 17 2014 at 06:43:46 AM, David Vrabel <david.vrabel@xxxxxxxxxx> wrote: > You may need to consider a parallel set of map/unmap API calls that > return/accept a handle, and then converting drivers one-by-one as > required, instead of trying to convert every single driver at once. However, I'm not sure whether the costs of having a parallel set of APIs outweigh the benefits of a cleaner API and a slight performance boost... But I hope the idea isn't completely abandoned without some profiling or other evidence of its benefits (e.g. patches showing how drivers could be simplified with the new APIs). -Mitch -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project