On 10/01/17 13:42, Arnd Bergmann wrote: > On Tuesday, January 10, 2017 1:25:12 PM CET Robin Murphy wrote: >> On 10/01/17 12:47, Nikita Yushchenko wrote: >>>> The point here is that an IOMMU doesn't solve your issue, and the >>>> IOMMU-backed DMA ops need the same treatment. In light of that, it really >>>> feels to me like the DMA masks should be restricted in of_dma_configure >>>> so that the parent mask is taken into account there, rather than hook >>>> into each set of DMA ops to intercept set_dma_mask. We'd still need to >>>> do something to stop dma_set_mask widening the mask if it was restricted >>>> by of_dma_configure, but I think Robin (cc'd) was playing with that. >>> >>> What issue "IOMMU doesn't solve"? >>> >>> Issue I'm trying to address is - inconsistency within swiotlb >>> dma_map_ops, where (1) any wide mask is silently accepted, but (2) then >>> mask is used to decide if bounce buffers are needed or not. This >>> inconsistency causes NVMe+R-Car cobmo not working (and breaking memory >>> instead). >> >> The fundamental underlying problem is the "any wide mask is silently >> accepted" part, and that applies equally to IOMMU ops as well. > > It's a much rarer problem for the IOMMU case though, because it only > impacts devices that are restricted to addressing of less than 32-bits. > > If you have an IOMMU enabled, the dma-mapping interface does not care > if the device can do wider than 32 bit addressing, as it will never > hand out IOVAs above 0xffffffff. I can assure you that it will - we constrain allocations to the intersection of the IOMMU domain aperture (normally the IOMMU's physical input address width) and the given device's DMA mask. If both of those are >32 bits then >32-bit IOVAs will fall out. For the arm64/common implementation I have prototyped a copy of the x86 optimisation which always first tries to get 32-bit IOVAs for PCI devices, but even then it can start returning higher addresses if the 32-bit space fills up. >>> I just can't think out what similar issue iommu can have. >>> Do you mean that in iommu case, mask also must not be set to whatever >>> wider than initial value? Why? What is the use of mask in iommu case? Is >>> there any real case when iommu can't address all memory existing in the >>> system? >> >> There's a very subtle misunderstanding there - the DMA mask does not >> describe the memory a device can address, it describes the range of >> addresses the device is capable of generating. Yes, in the non-IOMMU >> case they are equivalent, but once you put an IOMMU in between, the >> problem is merely shifted from "what range of physical addresses can >> this device access" to "what range of IOVAs is valid to give to this >> device" - the fact that those IOVAs can map to any underlying physical >> address only obviates the need for any bouncing at the memory end; it >> doesn't remove the fact that the device has a hardware addressing >> limitation which needs to be accommodated. >> >> The thread Will linked to describes that equivalent version of your >> problem - the IOMMU gives the device 48-bit addresses which get >> erroneously truncated because it doesn't know that only 42 bits are >> actually wired up. That situation still requires the device's DMA mask >> to correctly describe its addressing capability just as yours does. > > That problem should only impact virtual machines which have a guest > bus address space covering more than 42 bits of physical RAM, whereas > the problem we have with swiotlb is for the dma-mapping interface. As above, it impacts DMA API use for anything whose addressing capability is narrower than the IOMMU's reported input size and whose driver is able to blindly set a too-big DMA mask. It just happens to be the case that the stars line up on most systems, and for 32-bit devices who keep the default DMA mask. I actually have a third variation of this problem involving a PCI root complex which *could* drive full-width (40-bit) addresses, but won't, due to the way its PCI<->AXI interface is programmed. That would require even more complicated dma-ranges handling to describe the windows of valid physical addresses which it *will* pass, so I'm not pressing the issue - let's just get the basic DMA mask case fixed first. >>> With this direction, semantics of dma mask becomes even more >>> questionable. I'd say dma_mask is candidate for removal (or to move to >>> swiotlb's or iommu's local area) >> >> We still need a way for drivers to communicate a device's probed >> addressing capability to SWIOTLB, so there's always going to have to be >> *some* sort of public interface. Personally, the change in semantics I'd >> like to see is to make dma_set_mask() only fail if DMA is entirely >> disallowed - in the normal case it would always succeed, but the DMA API >> implementation would be permitted to set a smaller mask than requested >> (this is effectively what the x86 IOMMU ops do already). > > With swiotlb enabled, it only needs to fail if the mask does not contain > the swiotlb bounce buffer area, either because the start of RAM is outside > of the mask, or the bounce area has been allocated at the end of ZONE_DMA > and the mask is smaller than ZONE_DMA. Agreed, I'd managed to overlook that specific case, but I'd be inclined to consider "impossible" a subset of "disallowed" still :) Robin.