On Tue, Apr 14, 2020 at 9:11 AM Ørjan Eide <orjan.eide@xxxxxxx> wrote: > > On Tue, Apr 14, 2020 at 04:28:10PM +0200, Greg Kroah-Hartman wrote: > > On Tue, Apr 14, 2020 at 04:18:47PM +0200, �rjan Eide wrote: > > > Only sync the sg-list of an Ion dma-buf attachment when the attachment > > > is actually mapped on the device. > > > > > > dma-bufs may be synced at any time. It can be reached from user space > > > via DMA_BUF_IOCTL_SYNC, so there are no guarantees from callers on when > > > syncs may be attempted, and dma_buf_end_cpu_access() and > > > dma_buf_begin_cpu_access() may not be paired. > > > > > > Since the sg_list's dma_address isn't set up until the buffer is used > > > on the device, and dma_map_sg() is called on it, the dma_address will be > > > NULL if sync is attempted on the dma-buf before it's mapped on a device. > > > > > > Before v5.0 (commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops > > > into the dma_direct code")) this was a problem as the dma-api (at least > > > the swiotlb_dma_ops on arm64) would use the potentially invalid > > > dma_address. How that failed depended on how the device handled physical > > > address 0. If 0 was a valid address to physical ram, that page would get > > > flushed a lot, while the actual pages in the buffer would not get synced > > > correctly. While if 0 is an invalid physical address it may cause a > > > fault and trigger a crash. > > > > > > In v5.0 this was incidentally fixed by commit 55897af63091 ("dma-direct: > > > merge swiotlb_dma_ops into the dma_direct code"), as this moved the > > > dma-api to use the page pointer in the sg_list, and (for Ion buffers at > > > least) this will always be valid if the sg_list exists at all. > > > > > > But, this issue is re-introduced in v5.3 with > > > commit 449fa54d6815 ("dma-direct: correct the physical addr in > > > dma_direct_sync_sg_for_cpu/device") moves the dma-api back to the old > > > behaviour and picks the dma_address that may be invalid. > > > > > > dma-buf core doesn't ensure that the buffer is mapped on the device, and > > > thus have a valid sg_list, before calling the exporter's > > > begin_cpu_access. > > > > > > Signed-off-by: �rjan Eide <orjan.eide@xxxxxxx> > > > --- > > > drivers/staging/android/ion/ion.c | 12 ++++++++++++ > > > 1 file changed, 12 insertions(+) > > > > > > Resubmit without disclaimer, sorry about that. > > > > > > This seems to be part of a bigger issue where dma-buf exporters assume > > > that their dma-buf begin_cpu_access and end_cpu_access callbacks have a > > > certain guaranteed behavior, which isn't ensured by dma-buf core. > > > > > > This patch fixes this in ion only, but it also needs to be fixed for > > > other exporters, either handled like this in each exporter, or in > > > dma-buf core before calling into the exporters. > > > > > > diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c > > > index 38b51eace4f9..7b752ba0cb6d 100644 > > > --- a/drivers/staging/android/ion/ion.c > > > +++ b/drivers/staging/android/ion/ion.c > > > > Now that we have the dma-buff stuff in the tree, do we even need the > > ion code in the kernel anymore? Can't we delete it now? > > It looks like the new dma-heaps have the same issue as ion. The > heap-helpers also do dma_sync_sg_for_device() unconditionally on > end_cpu_access which may happen before dma_map_sg(), leading to use of > the 0 dma_address in the sg list of a, yet unmapped, attachment. Yea, the dma-buf heaps code came from the ION logic, so it likely has the same faults. > It could be fixed in dma-heaps just like this patch does for ion. Is > patch a valid way to fix this problem? Or, should this rather be handled > in dma-buf core by tracking the mapped state of attachments there? In the short-term, I'd definitely prefer to have a fix to dmabuf heaps rather then ION, but I also agree that long term it probably shouldn't just be up to the dma-buf exporter (as there are other dmabuf exporters that may have it wrong too), and that we need to address some DMA API expectations/limitations to better handle multiple device pipelines. (I actually gave a talk last fall on some of the issues I see around it: https://youtu.be/UsEVoWD_o0c ) thanks -john