Hi Daniel / Sima, Le jeudi 18 janvier 2024 à 14:59 +0100, Daniel Vetter a écrit : > On Thu, Jan 18, 2024 at 02:56:31PM +0100, Daniel Vetter wrote: > > On Mon, Jan 15, 2024 at 01:54:27PM +0100, Paul Cercueil wrote: > > > Hi Daniel / Sima, > > > > > > Le mardi 09 janvier 2024 à 14:01 +0100, Daniel Vetter a écrit : > > > > On Tue, Jan 09, 2024 at 12:06:58PM +0100, Paul Cercueil wrote: > > > > > Hi Daniel / Sima, > > > > > > > > > > Le lundi 08 janvier 2024 à 20:19 +0100, Daniel Vetter a > > > > > écrit : > > > > > > On Mon, Jan 08, 2024 at 05:27:33PM +0100, Paul Cercueil > > > > > > wrote: > > > > > > > Le lundi 08 janvier 2024 à 16:29 +0100, Daniel Vetter a > > > > > > > écrit : > > > > > > > > On Mon, Jan 08, 2024 at 03:21:21PM +0100, Paul Cercueil > > > > > > > > wrote: > > > > > > > > > Hi Daniel (Sima?), > > > > > > > > > > > > > > > > > > Le lundi 08 janvier 2024 à 13:39 +0100, Daniel Vetter > > > > > > > > > a > > > > > > > > > écrit : > > > > > > > > > > On Mon, Jan 08, 2024 at 01:00:55PM +0100, Paul > > > > > > > > > > Cercueil > > > > > > > > > > wrote: > > > > > > > > > > > +static void ffs_dmabuf_signal_done(struct > > > > > > > > > > > ffs_dma_fence > > > > > > > > > > > *dma_fence, int ret) > > > > > > > > > > > +{ > > > > > > > > > > > + struct ffs_dmabuf_priv *priv = > > > > > > > > > > > dma_fence- > > > > > > > > > > > > priv; > > > > > > > > > > > + struct dma_fence *fence = &dma_fence- > > > > > > > > > > > >base; > > > > > > > > > > > + > > > > > > > > > > > + dma_fence_get(fence); > > > > > > > > > > > + fence->error = ret; > > > > > > > > > > > + dma_fence_signal(fence); > > > > > > > > > > > + > > > > > > > > > > > + dma_buf_unmap_attachment(priv->attach, > > > > > > > > > > > dma_fence- > > > > > > > > > > > > sgt, > > > > > > > > > > > dma_fence->dir); > > > > > > > > > > > + dma_fence_put(fence); > > > > > > > > > > > + ffs_dmabuf_put(priv->attach); > > > > > > > > > > > > > > > > > > > > So this can in theory take the dma_resv lock, and > > > > > > > > > > if the > > > > > > > > > > usb > > > > > > > > > > completion > > > > > > > > > > isn't an unlimited worker this could hold up > > > > > > > > > > completion > > > > > > > > > > of > > > > > > > > > > future > > > > > > > > > > dma_fence, resulting in a deadlock. > > > > > > > > > > > > > > > > > > > > Needs to be checked how usb works, and if stalling > > > > > > > > > > indefinitely > > > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > io_complete callback can hold up the usb stack you > > > > > > > > > > need > > > > > > > > > > to: > > > > > > > > > > > > > > > > > > > > - drop a dma_fence_begin/end_signalling annotations > > > > > > > > > > in > > > > > > > > > > here > > > > > > > > > > - pull out the unref stuff into a separate > > > > > > > > > > preallocated > > > > > > > > > > worker > > > > > > > > > > (or at > > > > > > > > > > least the final unrefs for ffs_dma_buf). > > > > > > > > > > > > > > > > > > Only ffs_dmabuf_put() can attempt to take the > > > > > > > > > dma_resv and > > > > > > > > > would > > > > > > > > > have > > > > > > > > > to be in a worker, right? Everything else would be > > > > > > > > > inside > > > > > > > > > the > > > > > > > > > dma_fence_begin/end_signalling() annotations? > > > > > > > > > > > > > > > > Yup. Also I noticed that unlike the iio patches you > > > > > > > > don't > > > > > > > > have > > > > > > > > the > > > > > > > > dma_buf_unmap here in the completion path (or I'm > > > > > > > > blind?), > > > > > > > > which > > > > > > > > helps a > > > > > > > > lot with avoiding trouble. > > > > > > > > > > > > > > They both call dma_buf_unmap_attachment() in the "signal > > > > > > > done" > > > > > > > callback, the only difference I see is that it is called > > > > > > > after > > > > > > > the > > > > > > > dma_fence_put() in the iio patches, while it's called > > > > > > > before > > > > > > > dma_fence_put() here. > > > > > > > > > > > > I was indeed blind ... > > > > > > > > > > > > So the trouble is this wont work because: > > > > > > - dma_buf_unmap_attachment() requires dma_resv_lock. This > > > > > > is a > > > > > > somewhat > > > > > > recent-ish change from 47e982d5195d ("dma-buf: Move > > > > > > dma_buf_map_attachment() to dynamic locking > > > > > > specification"), so > > > > > > maybe > > > > > > old kernel or you don't have full lockdep enabled to get > > > > > > the > > > > > > right > > > > > > splat. > > > > > > > > > > > > - dma_fence critical section forbids dma_resv_lock > > > > > > > > > > > > Which means you need to move this out, but then there's the > > > > > > potential > > > > > > cache management issue. Which current gpu drivers just > > > > > > kinda > > > > > > ignore > > > > > > because it doesn't matter for current use-case, they all > > > > > > cache > > > > > > the > > > > > > mapping > > > > > > for about as long as the attachment exists. You might want > > > > > > to do > > > > > > the > > > > > > same, > > > > > > unless that somehow breaks a use-case you have, I have no > > > > > > idea > > > > > > about > > > > > > that. > > > > > > If something breaks with unmap_attachment moved out of the > > > > > > fence > > > > > > handling > > > > > > then I guess it's high time to add separate cache- > > > > > > management only > > > > > > to > > > > > > dma_buf (and that's probably going to be quite some wiring > > > > > > up, > > > > > > not > > > > > > sure > > > > > > even how easy that would be to do nor what exactly the > > > > > > interface > > > > > > should > > > > > > look like). > > > > > > > > > > Ok. Then I'll just cache the mapping for now, I think. > > > > > > > > Yeah I think that's simplest. I did ponder a bit and I don't > > > > think > > > > it'd be > > > > too much pain to add the cache-management functions for device > > > > attachments/mappings. But it would be quite some typing ... > > > > -Sima > > > > > > It looks like I actually do have some hardware which requires the > > > cache > > > management. If I cache the mappings in both my IIO and USB code, > > > it > > > works fine on my ZedBoard, but it doesn't work on my ZCU102. > > > > > > (Or maybe it's something else? What I get from USB in that case > > > is a > > > stream of zeros, I'd expect it to be more like a stream of > > > garbage/stale data). > > > > > > So, change of plans; I will now unmap the attachment in the > > > cleanup > > > worker after the fence is signalled, and add a warning comment > > > before > > > the end of the fence critical section about the need to do cache > > > management before the signal. > > > > > > Does that work for you? > > > > The trouble is, I'm not sure this works for you. If you rely on the > > fences, and you have to do cache management in between dma > > operations, > > then doing the unmap somewhen later will only mostly paper over the > > issue, > > but not consistently. > > > > I think that's really bad because the bugs this will cause are very > > hard > > to track down and with the current infrastructure impossible to > > fix. > > > > Imo cache the mappings, and then fix the cache management bug > > properly. > > > > If you want an interim solution that isn't blocked on the dma-buf > > cache > > management api addition, the only thing that works is doing the > > operations > > synchronously in the ioctl call. Then you don't need fences, and > > you can > > guarantee that the unmap has finished before userspace proceeds. > > > > With the dma_fences you can't guarantee that, it's just pure luck. > > Maybe a follow up: Double check you really need the cache management > between the dma operations from 2 different devices, and not for the > cpu > access that you then probably do to check the result. > > Because if the issue is just cpu access, then protecting the cpu > access > needs to use the begin/end_cpu_access dma-functions (or the > corresponding > ioctl if you use mmap from userspace) anyway, and that should sort > out any > issues you have for cpu access. > > Just to make sure we're not needlessly trying to fix something that > isn't > actually the problem. I am not doing any CPU access - I'm just attaching the same DMABUF to IIO and USB and use the new IOCTLs to transfer data. Can I just roll my own cache management then, using dma_sync_sg_for_cpu/device? I did a quick-and-dirty check with it, and it seems to make things work with cached mappings. > -Sima Cheers, -Paul