On Fri, Oct 12, 2018 at 10:51 AM, Laura Abbott <labbott@xxxxxxxxxx> wrote: > On 10/10/2018 04:33 PM, John Stultz wrote: >> >> Since 4.12, much later narrowed down to commit 2a55e7b5e544 >> ("staging: android: ion: Call dma_map_sg for syncing and mapping"), >> we have seen graphics performance issues on the HiKey960. >> >> This was initially confounded by the fact that the out-of-tree >> DRM driver was using HiSi custom ION heap which broke with the >> 4.12 ION abi changes, so there was lots of suspicion that the >> performance problems were due to switching to a somewhat simple >> cma based DRM driver for HiKey960. Additionally, as no >> performance regression was seen w/ the original HiKey board >> (which is SMP, not big.LITTLE as w/ HiKey960), there was some >> thought that the out-of-tree EAS code wasn't quite optimized. >> >> But after chasing a number of other leads, I found that >> reverting the ION code to 4.11-era got the majority of the >> graphics performance back (there may yet be further EAS tweaks >> needed), which lead me to the dma_map_sg change. >> >> In talking w/ Laura and Liam, it was suspected that the extra >> cache operations were causing the trouble. Additionally, I found >> that part of the reason we didn't see this w/ the original >> HiKey board is that its (proprietary blob) GL code uses ion_mmap >> and ion_map_dma_buf is called very rarely, where as with >> HiKey960, the (also proprietary blob) GL code calls >> ion_map_dma_buf much more frequently via the kernel driver. >> >> Anyway, with the cause of the performance regression isolated, >> I've tried to find a way to improve the performance of the >> current code. >> >> This approach, which I've mostly copied from the drm_prime >> implementation is to try to track the direction we're mapping >> the buffers so we can avoid calling dma_map/unmap_sg on every >> ion_map_dma_buf/ion_unmap_dma_buf call, and instead try to do >> the work in attach/detach paths. >> >> I'm not 100% sure of the correctness here, so close review would >> be good, but it gets the performance back to being similar to >> reverting the ION code to the 4.11-era. >> >> Feedback would be greatly appreciated! >> ... >> @@ -264,7 +291,6 @@ static void ion_unmap_dma_buf(struct >> dma_buf_attachment *attachment, >> struct sg_table *table, >> enum dma_data_direction direction) >> { >> - dma_unmap_sg(attachment->dev, table->sgl, table->nents, >> direction); > > > This changes the semantics so that the only time a buffer > gets unmapped is on detach. I don't think we want to restrict > Ion to that behavior but I also don't know if anyone else > is relying on that. I thought there might have been some Qualcomm > stuff that did that (Liam? Todd?) > > I suspect most of the cost of the dma_map/dma_unmap is from the > cache flushing and not the actual mapping operations. If this > is the case, another option might be to figure out how to > incorporate dma_attrs so drivers can use DMA_ATTR_SKIP_CPU_SYNC > to decide when they actually want to sync. Ok. Thanks so much for the feedback and the suggestion. I'll try to look into dma_attrs here shortly. thanks -john _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel