Re: CRASH 3.18-rc2, 3.17.1, isert_connect_request

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



W dniu 03.11.2014 o 12:27, Sagi Grimberg pisze:
On 11/3/2014 12:28 PM, Adam Mazur wrote:
Can someone help us with these crashes? We are not able to recreate it
on demand, but it takes 30 minutes to a few hours to appear the crash.
We've seen it on kernel 3.17.1 and 3.18-rc2.


Hay Adam,

CC'ing target-devel mailing list (where iser target is maintained).

So I stepped on this issue as well, and I actually have a fix for it
in the pipe. I'm planning to test it with a few other fixes for a little
while longer before I submit the code.

In general, This crash occurs due to a race between tpg shutdown (or
np disable) and RDMA_CM connect requests happening in parallel. iser
target tries to reference a tpg attribute while the np->tpg_np is
actually NULL.

How many targets/initiators/portals did you use? HCA?

Hi Sagi,

There are about 300 targets (lvm volumes), 4 initiators, two portals.

HCA by lspci:
05:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20)
        Subsystem: Mellanox Technologies MT25204 [InfiniHost III Lx HCA]
        Flags: bus master, fast devsel, latency 0, IRQ 46
        Memory at df500000 (64-bit, non-prefetchable) [size=1M]
        Memory at de800000 (64-bit, prefetchable) [size=8M]
        Capabilities: [40] Power Management version 2
        Capabilities: [48] Vital Product Data
        Capabilities: [90] MSI: Enable- Count=1/32 Maskable- 64bit+
        Capabilities: [84] MSI-X: Enable+ Count=32 Masked-
        Capabilities: [60] Express Endpoint, MSI 00
        Kernel driver in use: ib_mthca


root@portal-1:~# mstflint -d 05:00.0 q
Image type:      Failsafe
FW Version:      1.2.0
I.S. Version:    1
Device ID:       25204
Chip Revision:   A0
Description:     Node             Port1            Sys image
GUIDs:           0005ad00000c75c8 0005ad00000c75c9 0005ad00000c75cb
Board ID:         (MT_0260000002)
VSD:             
PSID:            MT_0260000002


root@portal-2:~# mstflint -d 05:00.0 q
Image type:      Failsafe
I.S. Version:    1
Chip Revision:   A0
Description:     Node             Port1            Sys image
GUIDs:           0005ad00000c7010 0005ad00000c7011 0005ad00000c7013
Board ID:         (MT_0260000002)
VSD:             
PSID:            MT_0260000002


Would it be possible to send you some patches to test as well?

Absolutely, we can immediately test any patch on any kernel version.

Thanks
Adam
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux