Re: [PATCH rdma-core 06/14] i40iw: Get rid of unique barrier macros

Shiraz Saleem <shiraz.saleem@xxxxxxxxx> · Fri, 3 Mar 2017 15:45:14 -0600

On Wed, Mar 01, 2017 at 04:05:06PM -0700, Jason Gunthorpe wrote:
> On Wed, Mar 01, 2017 at 04:14:20PM -0600, Shiraz Saleem wrote:
> > > Is there DMA occuring to shadow_area?
> > 
> > The shadow area contains status variables which are read by SW and 
> > updated by PCI device.
> 
> So the device is DMA'ing to it, and the driver is reading DMA memory..
> 
> > > What can wrong if it executes like this?
> > > 
> > > get_64bit_val(qp->shadow_area, I40IW_BYTE_0, &temp);
> > > udma_to_device_barrier(); /* make sure WQE is populated before valid bit is set */
> > > set_64bit_val(wqe, I40IW_BYTE_24, header);
> > > udma_to_device_barrier();
> > 
> > We need strict ordering that ensures write of the WQE completes before 
> > read of the shadow area.
> 
> > This ensures the value read from the shadow can be used to determine
> > if a DB ring is needed. If the shadow area is read first, the
> > algorithm, in certain cases, would not ring the DB when it should
> > and the HW may go idle with work requests posted.
> 
> This still is not making a lot of sense to me.. I really need to see a
> ladder diagram to understand your case.
> 
> Here is an example, I think what you are saying is: The HW could have
> fetched valid = 0 and stopped the queue and the driver needs to
> doorbell it to wake it up again. However, the driver optimizes away
> the doorbell rings in certain cases based on reading a DMA result.
> 
> So here is a possible ladder diagram:
> 
> CPU                         DMA DEVICE
>                             Issue READ#1 of valid bit
>  Respond to READ#1
>  SFENCE
>  set_valid_bit
>  MEFENCE
>  read_tail
>                             Receive READ#1 response with valid bit unset
> 			    Issue DMA WRITE to shadow_area indicating STOPPED
>  DMA WRITE arrives
> 
> And the version where the DMA is seen:
> 
> CPU                         DMA DEVICE
>                             Issue READ#1 of valid bit
>  SFENCE
>  Respond to READ#1
>  set_valid_bit
>  MEFENCE
>                             Receive READ#1 response with valid bit unset
> 			    Issue DMA WRITE to shadow_area indicating STOPPED
>  DMA WRITE arrives
>  read_tail
> 
> These diagrams attempt to show that the DMA device reads the valid bit
> then DMA's back to the shadow_area depending on what it read.

This is not quite how our DB logic works. There are additional HW steps and nuances 
in the flow. Unfortunately, to explain this, we need to provide details of our internal 
HW flow for the DB logic. We are unable to do so at this time.

The ordering is a HW requirement. i.e write valid bit of WQE __must__ precede the read 
tail of the shadow.

> 
> I get the feeling this approach requires MFENCE to do something it
> doesn't...

Mfence guarantees that load won't be reordered before the store, and thus 
we are using it.

We understand this is a unique requirement specific to our design but it is necessary.

The rest of the changes look ok.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html