Re: [PATCH 5/5] bus: mhi: host: Remove redundant dma_wmb() before ctx wp update

Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxx> · Wed, 4 May 2022 21:28:55 +0530

On Wed, May 04, 2022 at 11:25:33AM +0200, Loic Poulain wrote:
> On Wed, 4 May 2022 at 10:17, Manivannan Sadhasivam
> <manivannan.sadhasivam@xxxxxxxxxx> wrote:
> >
> > Hi Loic,
> >
> > On Wed, May 04, 2022 at 09:21:20AM +0200, Loic Poulain wrote:
> > > Hi Mani,
> > >
> > > On Mon, 2 May 2022 at 12:42, Manivannan Sadhasivam
> > > <manivannan.sadhasivam@xxxxxxxxxx> wrote:
> > > >
> > > > The endpoint device will only read the context wp when the host rings
> > > > the doorbell.
> > >
> > > Are we sure about this statement? what if we update ctxt_wp while the
> > > device is still processing the previous ring? is it going to continue
> > > processing the new ctxt_wp or wait for a new doorbell interrupt? what
> > > about burst mode in which we don't ring at all (ring_db is no-op)?
> > >
> >
> > Good point. I think my statement was misleading. But still this scenario won't
> > happen as per my undestanding. Please see below.
> >
> > > > And moreover the doorbell write is using writel(). This
> > > > guarantess that the prior writes will be completed before ringing
> > > > doorbell.
> > >
> > > Yes but the barrier is to ensure that descriptor/ring content is
> > > updated before we actually pass it to device ownership, it's not about
> > > ordering with the doorbell write, but the memory coherent ones.
> > >
> >
> > I see a clear data dependency between writing the ring element and updating the
> > context pointer. For instance,
> >
> > ```
> > struct mhi_ring_element *mhi_tre;
> >
> > mhi_tre = ring->wp;
> > /* Populate mhi_tre */
> > ...
> >
> > /* Increment wp */
> > ring->wp += el_size;
> >
> > /* Update ctx wp */
> > ring->ctx_wp = ring->iommu_base + (ring->wp - ring->base);
> > ```
> >
> > This is analogous to:
> >
> > ```
> > Read PTR A;
> > Update PTR A;
> > Increment PTR A;
> > Write PTR A to PTR B;
> > ```
> 
> Interesting point, but shouldn't it be more correct to translate it as:
> 
> 1. Write PTR A to PTR B (mhi_tre);
> 2. Update PTR B DATA;
> 3. Increment PTR A;
> 4. Write PTR A to PTR C;
> 
> In that case, it looks like line 2. has no ordering constraint with 3.
> & 4? whereas the following guarantee it:
> 
> 1. Write PTR A to PTR B (mhi_tre);
> 2. Update PTR B DATA;
> 3. Increment PTR A;
> dma_wmb()
> 4. Write PTR A to PTR C;
> 
> To be honest, compiler optimization is beyond my knowledge, so I don't
> know if a specific compiler arch/version could be able to mess up the
> sequence or not. But this pattern is really close to what is described
> for dma_wmb() usage in Documentation/memory-barriers.txt. That's why I
> challenged this change and would be conservative, keeping the explicit
> barrier.
> 

Hmm. Since I was reading the memory model and going through the MHI code, I
_thought_ that this dma_wmb() is redundant. But I missed the fact that the
updating to memory pointed by "wp" happens implicitly via a pointer. So that
won't qualify as a direct dependency.

> >
> > Here, because of the data dependency due to "ring->wp", the CPU or compiler
> > won't be ordering the instructions. I think that's one of the reason we never
> > hit any issue due to this.
> 
> You may be right here about the implicit ordering guarantee... So if
> you're sure, I think it would deserve an inline comment to explain why
> we don't need a memory barrier as in the 'usual' dma descriptor update
> sequences.
> 

I think the barrier makes sense now. Sorry for the confusion and thanks for the
explanations.

Thanks,
Mani

> Loic