Re: Mlx4: BUG: unable to handle kernel at ffffffffa02be210

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Or,

I managed to update the kernel to OFED 3.0 to verify the bug, but I
can still produce the bug, maybe there're still some synchronice_irq
is missing?

Thanks
Jack

2015-07-08 16:07 GMT+02:00 Jack Wang <xjtuwjp@xxxxxxxxx>:
> Thanks for your time.
>
> Looks the last one is missing in OFED 2.4 driver, I just checked the
> history of mainline
>
> commit bf1bac5b7882daa41249f85fbc97828f0597de5c
> Author: Eli Cohen <eli@xxxxxxxxxxxxxxxxxx>
> Date:   Thu Oct 23 15:57:27 2014 +0300
>
>     net/mlx4_core: Call synchronize_irq() before freeing EQ buffer
>
>     After moving the EQ ownership to software effectively destroying it, call
>     synchronize_irq() to ensure that any handler routines running on other CPU
>     cores finish execution. Only then free the EQ buffer.
>     The same thing is done when we destroy a CQ which is one of the sources
>     generating interrupts. In the case of CQ we want to avoid
> completion handlers
>     on a CQ that was destroyed. In the case we do the same to avoid receiving
>     asynchronous events after the EQ has been destroyed and its buffers freed.
>
>     Signed-off-by: Eli Cohen <eli@xxxxxxxxxxxx>
>     Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
>
> This fix looks fit the bug we're hitting. Yes, we plan to update 3.0
> OFED recently, and the fix is included there.
> Will report if the bug is still there
>
> Thanks again.
> Jack
>
> 2015-07-08 15:49 GMT+02:00 Or Gerlitz <ogerlitz@xxxxxxxxxxxx>:
>> On 7/8/2015 3:47 PM, Jack Wang wrote:
>>>
>>> static void mlx4_ib_cq_comp(struct mlx4_cq *cq)
>>> 47 {
>>> 48 struct ib_cq *ibcq = &to_mibcq(cq)->ibcq;
>>> 49 ibcq->comp_handler(ibcq, ibcq->cq_context);
>>> 50 }
>>>
>>> Looks like cq use-after-free? I have no idea where.
>>
>>
>> see if you have in the code base you're using (why not the stock 3.18.14
>> driver, BTW?) all the synchronize_irq
>> calls we have in the latest upstream driver:
>>
>> drivers/net/ethernet/mellanox/mlx4/cq.c:371:
>> synchronize_irq(priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq);
>> drivers/net/ethernet/mellanox/mlx4/cq.c:374:
>> synchronize_irq(priv->eq_table.eq[MLX4_EQ_ASYNC].irq);
>> drivers/net/ethernet/mellanox/mlx4/eq.c:1088: synchronize_irq(eq->irq);
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux