Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array

Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx> · Wed, 26 Apr 2017 03:39:12 +0000

On Tue, 2017-04-25 at 16:37 -0400, Laurence Oberman wrote:
> Hello Bart, Leon, Max and Israel.
> 
> I cloned off Barts tree.
> 
> git clone https://github.com/bvanassche/linux
> cd linux
> git checkout block-scsi-for-next
> 
> I checked all patches were in for this test.
> 
> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS
> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array
> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt
> 
> Built and tested the kernel.
> 
> However this issue is not resolved :(
> 
> [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817edca86b0
> [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe
> [ 2708.121342] 00000000 00000000 00000000 00000000
> [ 2708.147104] 00000000 00000000 00000000 00000000
> [ 2708.172633] 00000000 00000000 00000000 00000000
> [ 2708.198702] 00000000 0f007806 2500002a 14a527d0
> [ 2732.434127] scsi host1: ib_srp: reconnect succeeded
> [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff8817ed0a9c30

Hello Laurence,

Thank you for having run this test. But are you aware that if a flush error
is reported at the initiator side that does not necessarily mean that there
is a bug at the initiator side? If e.g. the target system would initiate a
disconnect that would also trigger this kind of flush errors. What kind of
SRP target system was used in this test? Were the clocks of initiator and
target system synchronized? Are the logs of the target system available? If
so, can you have a look whether anything interesting can be found in the
target log around the time the initiator reported the flush error?

Thanks,

Bart.��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f