On Tue, Dec 03, 2024 at 03:52:22PM +0200, Leon Romanovsky wrote: > From: Or Har-Toov <ohartoov@xxxxxxxxxx> > > Remove the done list, which has become unnecessary with the > introduction of the `state` parameter. > > Previously, the done list was used to ensure that MADs removed from > the wait list would still be in some list, preventing failures in > the call to `list_del` in `ib_mad_complete_send_wr`. Yuk, that is a terrible reason for this. list_del_init() would solve that problem. > @@ -1772,13 +1771,11 @@ ib_find_send_mad(const struct ib_mad_agent_private *mad_agent_priv, > void ib_mark_mad_done(struct ib_mad_send_wr_private *mad_send_wr) > { > mad_send_wr->timeout = 0; > - if (mad_send_wr->state == IB_MAD_STATE_WAIT_RESP) { > + list_del(&mad_send_wr->agent_list); This is doing more than the commit message says, we are now changing the order for when the mad is in the list, here you are removing it as soon as it becomes done, or semi-done instead of letting ib_mad_complete_send_wr() always remove it. I couldn't find a reason it is not OK, but I think it should be in the commit message. > static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv, > @@ -2249,7 +2246,9 @@ void ib_mad_complete_send_wr(struct ib_mad_send_wr_private *mad_send_wr, > } > > /* Remove send from MAD agent and notify client of completion */ > - list_del(&mad_send_wr->agent_list); > + if (mad_send_wr->state == IB_MAD_STATE_SEND_START) > + list_del(&mad_send_wr->agent_list); > + This extra if is confusing now.. There are two callers to ib_mad_complete_send_wr(), the receive path did ib_mark_mad_done() first so state should be DONE or EARLY_RESP and the list_del was done already. The other one is send completion which should have state be SEND_START *and* we hit an error making the send, then we remove it from the list? Again I think this needs to go further and stop using ->status as part of the FSM too. Trying again, maybe like this: spin_lock_irqsave(&mad_agent_priv->lock, flags); if (ib_mad_kernel_rmpp_agent(&mad_agent_priv->agent)) { ret = ib_process_rmpp_send_wc(mad_send_wr, mad_send_wc); if (ret == IB_RMPP_RESULT_CONSUMED) goto done; } else ret = IB_RMPP_RESULT_UNHANDLED; if (mad_send_wr->state == IB_MAD_STATE_SEND_START) { if (mad_send_wc->status != IB_WC_SUCCESS && mad_send_wr->timeout) { wait_for_response(mad_send_wr); goto done; } /* Otherwise error posting send */ list_del(&mad_send_wr->agent_list); } WARN_ON(mad_send_wr->state != IB_MAD_STATE_EARLY_RESP && mad_send_wr->state != IB_MAD_STATE_DONE); mad_send_wr->state = IB_MAD_STATE_DONE; mad_send_wr->status = mad_send_wc->status; adjust_timeout(mad_agent_priv); spin_unlock_irqrestore(&mad_agent_priv->lock, flags); Though that might require changing cancel_mad too, as in the other message, I think with the FSM cancel_mad should put the state to DONE and leave the status alone. This status fiddling is probably another patch. Jason