On 3/4/21 2:58 AM, Zhu Yanjun wrote: > On Thu, Mar 4, 2021 at 7:02 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote: >> >> Three errors occurred in the fix referenced below. >> >> 1) rxe_rcv_mcast_pkt() dropped a reference to ib_device when >> no error occured causing an underflow on the reference counter. >> This code is cleaned up to be clearer and easier to read. >> >> 2) Extending the reference taken by rxe_get_dev_from_net() in >> rxe_udp_encap_recv() until each skb is freed was not matched by >> a reference in the loopback path resulting in underflows. >> >> 3) In rxe_comp.c the function free_pkt() did not clear skb which >> triggered a warning at done: and could possibly at exit: in >> rxe_completer(). The WARN_ONCE() calls are not actually needed. >> >> This patch fixes these errors. >> >> Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()") >> Signed-off-by: Bob Pearson <rpearson@xxxxxxx> >> --- >> Version 2: >> v1 of this patch incorrectly added a WARN_ON_ONCE in rxe_completer >> where it could be triggered for normal traffic. This version >> replaced that with a pr_warn located correctly. >> >> v1 of this patch placed a call to kfree_skb in an if statement >> that could trigger style warnings. This version cleans that up. >> >> drivers/infiniband/sw/rxe/rxe_comp.c | 6 +-- >> drivers/infiniband/sw/rxe/rxe_net.c | 10 ++++- >> drivers/infiniband/sw/rxe/rxe_recv.c | 60 +++++++++++++++++----------- >> 3 files changed, 48 insertions(+), 28 deletions(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c >> index a8ac791a1bb9..96e5a73579f8 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_comp.c >> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c >> @@ -672,8 +672,10 @@ int rxe_completer(void *arg) >> */ >> >> /* there is nothing to retry in this case */ >> - if (!wqe || (wqe->state == wqe_state_posted)) >> + if (!wqe || (wqe->state == wqe_state_posted)) { >> + pr_warn("Retry attempted without a valid wqe\n"); >> goto exit; >> + } >> >> /* if we've started a retry, don't start another >> * retry sequence, unless this is a timeout. >> @@ -750,7 +752,6 @@ int rxe_completer(void *arg) >> /* we come here if we are done with processing and want the task to >> * exit from the loop calling us >> */ >> - WARN_ON_ONCE(skb); >> rxe_drop_ref(qp); >> return -EAGAIN; >> >> @@ -758,7 +759,6 @@ int rxe_completer(void *arg) >> /* we come here if we have processed a packet we want the task to call >> * us again to see if there is anything else to do >> */ >> - WARN_ON_ONCE(skb); > > With the above line is kept, I made tests with this commit. > 1. git clone https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git > 2. cd rdma && git pull > 3. apply this commit and with "WARN_ON_ONCE(skb);" kept > make tests with "rping ...." > The similar problem still occurs. > > Zhu Yanjun The WARNs occur because skb is not getting cleared not because packets are not being freed. The issue is whether the skbs are freed not whether a local variable still has an (old) address of an skb. The following would trigger a warning but doesn't mean anything skb = skb_alloc(...); kfree_skb(skb); WARN_ON_ONCE(skb); Every path out of the subroutine calls free_pkt() except one. That is because I was trying to not change the behavior to the original code. That occurs in the ERROR_RETRY state when no wqe is available. All other paths call free_pkt() and there calls kfree_skb(). On that one path we leak an skb which is not good so we should probably go ahead and free it too just dropping the packet. In that case we can move the free_pkt() to the end and make it explicit that all the packets are actually freed. I will modify the code to do that. The WARN is still not required. Bob