Hi Or, I managed to update the kernel to OFED 3.0 to verify the bug, but I can still produce the bug, maybe there're still some synchronice_irq is missing? Thanks Jack 2015-07-08 16:07 GMT+02:00 Jack Wang <xjtuwjp@xxxxxxxxx>: > Thanks for your time. > > Looks the last one is missing in OFED 2.4 driver, I just checked the > history of mainline > > commit bf1bac5b7882daa41249f85fbc97828f0597de5c > Author: Eli Cohen <eli@xxxxxxxxxxxxxxxxxx> > Date: Thu Oct 23 15:57:27 2014 +0300 > > net/mlx4_core: Call synchronize_irq() before freeing EQ buffer > > After moving the EQ ownership to software effectively destroying it, call > synchronize_irq() to ensure that any handler routines running on other CPU > cores finish execution. Only then free the EQ buffer. > The same thing is done when we destroy a CQ which is one of the sources > generating interrupts. In the case of CQ we want to avoid > completion handlers > on a CQ that was destroyed. In the case we do the same to avoid receiving > asynchronous events after the EQ has been destroyed and its buffers freed. > > Signed-off-by: Eli Cohen <eli@xxxxxxxxxxxx> > Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> > > This fix looks fit the bug we're hitting. Yes, we plan to update 3.0 > OFED recently, and the fix is included there. > Will report if the bug is still there > > Thanks again. > Jack > > 2015-07-08 15:49 GMT+02:00 Or Gerlitz <ogerlitz@xxxxxxxxxxxx>: >> On 7/8/2015 3:47 PM, Jack Wang wrote: >>> >>> static void mlx4_ib_cq_comp(struct mlx4_cq *cq) >>> 47 { >>> 48 struct ib_cq *ibcq = &to_mibcq(cq)->ibcq; >>> 49 ibcq->comp_handler(ibcq, ibcq->cq_context); >>> 50 } >>> >>> Looks like cq use-after-free? I have no idea where. >> >> >> see if you have in the code base you're using (why not the stock 3.18.14 >> driver, BTW?) all the synchronize_irq >> calls we have in the latest upstream driver: >> >> drivers/net/ethernet/mellanox/mlx4/cq.c:371: >> synchronize_irq(priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq); >> drivers/net/ethernet/mellanox/mlx4/cq.c:374: >> synchronize_irq(priv->eq_table.eq[MLX4_EQ_ASYNC].irq); >> drivers/net/ethernet/mellanox/mlx4/eq.c:1088: synchronize_irq(eq->irq); >> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html