On Wed, Jan 10, 2018 at 05:01:40PM -0500, Doug Ledford wrote: > On Wed, 2018-01-10 at 16:40 -0500, Doug Ledford wrote: > > On Tue, 2018-01-09 at 11:23 -0800, Bart Van Assche wrote: > > > The following sequence: > > > * Change queue pair state into IB_QPS_ERR. > > > * Post a work request on the queue pair. > > > Triggers the following race condition in the rdma_rxe driver: > > > * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function > > > that examines the QP send queue. > > > * rxe_post_send() posts a work request on the QP send queue. > > > > If rxe_completer() runs before rxe_post_send(), the send queue is > > believed to be empty while a stale work request stays on the send queue > > indefinitely. To avoid this race, schedule rxe_completer() after a work > > request is queued on a qp in the error state by rxe_post_send(). > > > > I think that improves the log message, yes? > > > > I did some further edits. But, patch applied to for-next. The proposed patch definitely decreases the chance of races, but it is not fixing them. There is a chance to have change in qp state immediately after your "if ..." check. Thanks > > -- > Doug Ledford <dledford@xxxxxxxxxx> > GPG KeyID: B826A3330E572FDD > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Attachment:
signature.asc
Description: PGP signature