On Thu, Jul 02, 2020 at 03:10:00PM +0200, Daniel Vetter wrote: > On Wed, Jul 01, 2020 at 02:15:24PM -0300, Jason Gunthorpe wrote: > > On Wed, Jul 01, 2020 at 05:42:21PM +0200, Daniel Vetter wrote: > > > > >> All you need is the ability to stop wait for ongoing accesses to end and > > > > >> make sure that new ones grab a new mapping. > > > > > Swap and flush isn't a general HW ability either.. > > > > > > > > > > I'm unclear how this could be useful, it is guarenteed to corrupt > > > > > in-progress writes? > > > > > > > > > > Did you mean pause, swap and resume? That's ODP. > > > > > > > > Yes, something like this. And good to know, never heard of ODP. > > > > > > Hm I thought ODP was full hw page faults at an individual page > > > level, > > > > Yes > > > > > and this stop&resume is for the entire nic. Under the hood both apply > > > back-pressure on the network if a transmission can't be received, > > > but > > > > NIC's don't do stop and resume, blocking the Rx pipe is very > > problematic and performance destroying. > > > > The strategy for something like ODP is more complex, and so far no NIC > > has deployed it at any granularity larger than per-page. > > > > > So since Jason really doesn't like dma_fence much I think for rdma > > > synchronous it is. And it shouldn't really matter, since waiting for a > > > small transaction to complete at rdma wire speed isn't really that > > > long an operation. > > > > Even if DMA fence were to somehow be involved, how would it look? > > Well above you're saying it would be performance destroying, but let's > pretend that's not a problem :-) Also, I have no clue about rdma, so this > is really just the flow we have on the gpu side. I see, no, this is not workable, the command flow in RDMA is not at all like GPU - what you are a proposing is a global 'stop the whole chip' Tx and Rx flows for an undetermined time. Not feasible What we can do is use ODP techniques and pause only the MR attached to the DMA buf with the process you outline below. This is not so hard to implement. > 3. rdma driver worker gets busy to restart rx: > 1. lock all dma-buf that are currently in use (dma_resv_lock). > thanks to ww_mutex deadlock avoidance this is possible Why all? Why not just lock the one that was invalidated to restore the mappings? That is some artifact of the GPU approach? And why is this done with work queues and locking instead of a callback saying the buffer is valid again? Jason