On ma, 2015-07-06 at 16:33 +0100, Chris Wilson wrote: > On Mon, Jul 06, 2015 at 05:29:39PM +0200, Daniel Vetter wrote: > > On Mon, Jul 06, 2015 at 03:57:44PM +0100, Chris Wilson wrote: > > > On Mon, Jul 06, 2015 at 05:50:37PM +0300, Imre Deak wrote: > > > > We have 3 types of DMA mappings for GEM objects: > > > > 1. physically contiguous for stolen and for objects needing contiguous > > > > memory > > > > 2. DMA-buf mappings imported via a DMA-buf attach operation > > > > 3. SG DMA mappings for shmem backed and userptr objects > > > > > > > > For 1. and 2. the lifetime of the DMA mapping matches the lifetime of the > > > > corresponding backing pages and so in practice we create/release the > > > > mapping in the object's get_pages/put_pages callback. > > > > > > > > For 3. the lifetime of the mapping matches that of any existing GPU binding > > > > of the object, so we'll create the mapping when the object is bound to > > > > the first vma and release the mapping when the object is unbound from its > > > > last vma. > > > > > > > > Since the object can be bound to multiple vmas, we can end up creating a > > > > new DMA mapping in the 3. case even if the object already had one. This > > > > is not allowed by the DMA API and can lead to leaked mapping data and > > > > IOMMU memory space starvation in certain cases. For example HW IOMMU > > > > drivers (intel_iommu) allocate a new range from their memory space > > > > whenever a mapping is created, silently overriding a pre-existing > > > > mapping. > > > > How does this happen? Essentially list_empty(obj->vmas) == > > !dma_mapping_exists should hold for objects of the 3rd type. I don't > > understand how this is broken in the current code. There was definitely > > versions of the ppgtt code where this wasn't working properly, but I > > thought we've fixed that up again. > > Every g/ppgtt binding remapped the obj->pages through the iommu. Even > with the DMAR disabled, we still pay the cpu cost of sw iommu (which is > itself an annoying kernel bug that you can't disable). > > > > > Fix this by adding new callbacks to create/release the DMA mapping. This > > > > way we can use the has_dma_mapping flag for objects of the 3. case also > > > > (so far the flag was only used for the 1. and 2. case) and skip creating > > > > a new mapping if one exists already. > > > > > > > > Note that I also thought about simply creating/releasing the mapping > > > > when get_pages/put_pages is called. However since creating a DMA mapping > > > > may have associated resources (at least in case of HW IOMMU) it does > > > > make sense to release these resources as early as possible. We can > > > > release the DMA mapping as soon as the object is unbound from the last > > > > vma, before we drop the backing pages, hence it's worth keeping the two > > > > operations separate. > > > > > > > > I noticed this issue by enabling DMA debugging, which got disabled after > > > > a while due to its internal mapping tables getting full. It also reported > > > > errors in connection to random other drivers that did a DMA mapping for > > > > an address that was previously mapped by i915 but was never released. > > > > Besides these diagnostic messages and the memory space starvation > > > > problem for IOMMUs, I'm not aware of this causing a real issue. > > > > > > Nope, it is much much simpler. Since we only do the dma prepare/finish > > > from inside get_pages/put_pages, we can put the calls there. The only > > > caveat there is userptr worker, but that can be easily fixed up. > > > > I do kinda like the distinction between just grabbing the backing storage > > and making it accessible to the hw. Small one, but I think it does help if > > we keep these two maps separate. Now the function names otoh are > > super-confusing, that I agree with. > > But that is the raison-d'etre of get_pages(). We call it preciselly when > we want the backing storage available to the hw. We relaxed that for > set-domain to avoid one type of bug, and stolen/dma-buf have their own > notion of dma mapping. userptr is the odd one out due to its worker > asynchronously grabbing the pages. Isn't the DMA mapping operation more tied to binding the object to a VMA? As far as I can see we call put_pages only when destroying the object (or attaching a physically contiguous mapping to it) and that's because at that point we also give up on the content of the buffer. Otherwise we just do unbinding when reclaiming memory. At this point it make sense to release the DMA mapping independently of releasing the buffer contents. --Imre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx