On Wed, 4 May 2022 at 10:17, Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxx> wrote: > > Hi Loic, > > On Wed, May 04, 2022 at 09:21:20AM +0200, Loic Poulain wrote: > > Hi Mani, > > > > On Mon, 2 May 2022 at 12:42, Manivannan Sadhasivam > > <manivannan.sadhasivam@xxxxxxxxxx> wrote: > > > > > > The endpoint device will only read the context wp when the host rings > > > the doorbell. > > > > Are we sure about this statement? what if we update ctxt_wp while the > > device is still processing the previous ring? is it going to continue > > processing the new ctxt_wp or wait for a new doorbell interrupt? what > > about burst mode in which we don't ring at all (ring_db is no-op)? > > > > Good point. I think my statement was misleading. But still this scenario won't > happen as per my undestanding. Please see below. > > > > And moreover the doorbell write is using writel(). This > > > guarantess that the prior writes will be completed before ringing > > > doorbell. > > > > Yes but the barrier is to ensure that descriptor/ring content is > > updated before we actually pass it to device ownership, it's not about > > ordering with the doorbell write, but the memory coherent ones. > > > > I see a clear data dependency between writing the ring element and updating the > context pointer. For instance, > > ``` > struct mhi_ring_element *mhi_tre; > > mhi_tre = ring->wp; > /* Populate mhi_tre */ > ... > > /* Increment wp */ > ring->wp += el_size; > > /* Update ctx wp */ > ring->ctx_wp = ring->iommu_base + (ring->wp - ring->base); > ``` > > This is analogous to: > > ``` > Read PTR A; > Update PTR A; > Increment PTR A; > Write PTR A to PTR B; > ``` Interesting point, but shouldn't it be more correct to translate it as: 1. Write PTR A to PTR B (mhi_tre); 2. Update PTR B DATA; 3. Increment PTR A; 4. Write PTR A to PTR C; In that case, it looks like line 2. has no ordering constraint with 3. & 4? whereas the following guarantee it: 1. Write PTR A to PTR B (mhi_tre); 2. Update PTR B DATA; 3. Increment PTR A; dma_wmb() 4. Write PTR A to PTR C; To be honest, compiler optimization is beyond my knowledge, so I don't know if a specific compiler arch/version could be able to mess up the sequence or not. But this pattern is really close to what is described for dma_wmb() usage in Documentation/memory-barriers.txt. That's why I challenged this change and would be conservative, keeping the explicit barrier. > > Here, because of the data dependency due to "ring->wp", the CPU or compiler > won't be ordering the instructions. I think that's one of the reason we never > hit any issue due to this. You may be right here about the implicit ordering guarantee... So if you're sure, I think it would deserve an inline comment to explain why we don't need a memory barrier as in the 'usual' dma descriptor update sequences. Loic