Re: CX314A WCE error: WR_FLUSH_ERR

Doug Ledford <dledford@xxxxxxxxxx> · Wed, 21 Aug 2019 14:47:02 -0400

On Wed, 2019-08-21 at 23:38 +0800, Liu, Changcheng wrote:
> On 09:36 Wed 21 Aug, Tom Talpey wrote:
> > On 8/21/2019 8:09 AM, Liu, Changcheng wrote:
> > > Hi all,
> > >     In one system, it always frequently hit "IBV_WC_WR_FLUSH_ERR"
> > > in the WCE(work completion element) polled from completion queue
> > > bound with RQ(Receive Queue).
> > >     Does anyone has some idea to debug "IBV_WC_WR_FLUSH_ERR"
> > > problem?
> > > 
> > >     With CX314A/40Gb NIC, I hit this error when using RC transport
> > > type with only Send Operation(IBV_WR_SEND) WR(work request) on
> > > SQ(Send Queue).
> > >     Every WR only has one SGE(scatter/gather element) and all the
> > > SGE on RQ has the same size. The SGE size in SQ WR is not greater
> > > than the SGE size in RQ WR.
> > > 
> > >    There’s one explanation about IBV_WC_WR_FLUSH_ERR on page 114
> > > in the "RDMA Aware Networks Programming User Manual" 
> > > http://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf
> > >    But I still didn't understand it well. How to trigger this
> > > error with a short demo program?
> > >    "
> > >      IBV_WC_WR_FLUSH_ERR
> > >      This event is generated when an invalid remote error is
> > > thrown when the responder detects an
> > >      invalid request. It may be that the operation is not
> > > supported by the request queue or there is
> > >      insufficient buffer space to receive the request.
> > >    "
> > 
> > The most common reason for a flushed work request is loss of
> > the connection to the remote peer. This can be caused by any
> > number of conditions.
> Good diretion. I'll debug it in this way first.
> > The second-most common is a programming error in the upper
> > layer protocol. A shortage of posted receives on either peer,
> > a protection error on some buffer, etc.
> Do you mean the protection key such as l_key/r_key isn't set well?
> What's kind of protection error could trigger IBV_WC_WR_FLUSH_ERR?

FLUSH_ERR is the error used whenever a queue pair goes into an error
state and there are still WQEs posted to the queue pair.  All
outstanding WQEs are returned with the state IBV_WC_WR_FLUSH_ERR.  This
is how you make sure you don't loose WQEs when the QP hits an error
state.  So, literally *anything* that can cause a QP to go into an ERROR
state will result in all WQEs currently posted to the QP being sent back
with this FLUSH_ERR.  FLUSH_ERR literally just means that the card is
flushing out the QP's work queue because now that the QP is in an error
state it can't process the WQEs and, presumably, the application needs
to know which ones completed and which ones didn't so it knows what to
requeue once the QP is no longer in an error state.

As Tom has already pointed out, all of these things will throw the queue
pair into an error state and cause all posted WQEs to be flushed with
the FLUSH_ERR condition:

1) Loss of queue pair connection
2) Any memory permission violation (attempt to write to read only
memory, attempt to RDMA read/write to an invalid rkey, etc)
3) Receipt of any post_send message without a waiting post_recv buffer
to accept the message
4) Receipt of a post_send message that is too large to fit in the first
available post_recv buffer

A common cause of this sort of thing is when you don't do proper flow
control on the queue pair and the sending side floods the receiving side
and runs it out of posted recv WQEs.  Although, in your case, you did
say this was happening on the receive queue, so that implies this is
happening on the receiving side, so if that is what's happenining here,
the process would have to be something like:

sender starts sending data (maybe without any flow control)
	receiver starts receiving data and refilling buffers
	...
	receiver runs totally dry of buffers and gets an incoming recv
	causing qp to go into error state

	receiver then posts refill buffers to the RQ after the QP
	went into error state but before acknowledging the error state
	and shutting down the recv processing thread

	all recv buffers posted as WQEs are flushed back to the process
	with FLUSH_ERR because they were posted to a QP in ERROR state

> > If you're looking to actually trigger this error for testing,
> > well, try one of the above. If you're trying to figure out
> > why it's happening, that can take some digging, but not in
> > the RDMA stack, typically.
> Many thanks.
> 
> --Changcheng
> > Tom.
> > 

-- 
Doug Ledford <dledford@xxxxxxxxxx>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
Attachment:
signature.asc

Description: This is a digitally signed message part