Re: [PATCH 0/3] scsi: fcoe: memleak fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Thu, Aug 09, 2018 at 12:01:30PM +0200, ard wrote:
> This to determine if we have a single regression in just the
> login handling or both.
As a matter of fact, I think this will not work on vn2vn:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/libfc?id=386b97b43c0c9e0d878eec7ea1db16af22b036ae
scsi: libfc: Rework PRLI handling
<theorising>
As it clearly rejects the PRLI and eventually sends a LOGO when
no store is associated.
But as san setup goes: it first attaches the network driver to
the fcoe layer and hence is store less. Then it attaches stores
to export. But at that moment all possible initiators already got
a LOGO
</theorising>

But looking back at the logs, it is not all that bad as what I
said.  It's not erroring ad infinitum, it's: 

root@antec:~/logs# grep "vn_add rport 00c76e\|kmemleak" 2018-08-08-kern.log|cut -d\  -f9-|uniq -c
      1   2.577320] kmemleak: Kernel memory leak detector initialized
      1   2.577350] kmemleak: Automatic memory scanning thread started
      1 136.452894] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      1 host10: fip: vn_add rport 00c76e new state 0
      1 host10: fip: vn_add rport 00c76e old state 0
      8 host10: fip: vn_add rport 00c76e old state 4
      1 kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
     50 host10: fip: vn_add rport 00c76e old state 4
      1 kmemleak: 4 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      2 host10: fip: vn_add rport 00c76e old state 4
      1 kmemleak: 47 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
     50 host10: fip: vn_add rport 00c76e old state 4
      1 kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
     50 host10: fip: vn_add rport 00c76e old state 4
      1 kmemleak: 47 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
     52 host10: fip: vn_add rport 00c76e old state 4
      1 kmemleak: 50 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
     50 host10: fip: vn_add rport 00c76e old state 4
      1 kmemleak: 47 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
     50 host10: fip: vn_add rport 00c76e old state 4
      1 kmemleak: 55 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
     52 host10: fip: vn_add rport 00c76e old state 4
      1 kmemleak: 46 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
     50 host10: fip: vn_add rport 00c76e old state 4
      1 kmemleak: 46 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
     36 host10: fip: vn_add rport 00c76e old state 4
      1 kmemleak: 50 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      1 kmemleak: 36 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      1 kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      1 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

>From the last few lines we can clearly see that de vn_add rport ...  old state 4
coincides with a kmemleak 10 minutes later. especially 50 and 36. Now the exact
dump follows for that.

Now to get back to what is the difference:
This is a working login with a 4.9:

Aug  8 10:53:15 localhost kernel: [   14.929451] host10: fip: vn_add rport 0004e0 old state 0
Aug  8 10:53:21 localhost kernel: [   23.143274] host10: fip: beacon from rport 4e0
Aug  8 10:53:21 localhost kernel: [   23.143275] host10: fip: beacon expired for rport 4e0
Aug  8 10:53:21 localhost kernel: [   23.143276] host10: rport 0004e0: Login to port
Aug  8 10:53:21 localhost kernel: [   23.143277] host10: rport 0004e0: Entered FLOGI state from Init state
Aug  8 10:53:21 localhost kernel: [   23.143294] host10: fip: els_send op 7 d_id 4e0
Aug  8 10:53:21 localhost kernel: [   23.143301] host10: fip: beacon from rport 4e0
Aug  8 10:53:21 localhost kernel: [   23.143920] host10: rport 0004e0: Received a FLOGI accept
Aug  8 10:53:21 localhost kernel: [   23.143921] host10: rport 0004e0: Port entered PLOGI state from FLOGI state
Aug  8 10:53:21 localhost kernel: [   23.143956] host10: rport 0004e0: Received a PLOGI accept
Aug  8 10:53:21 localhost kernel: [   23.143958] host10: rport 0004e0: Port entered PRLI state from PLOGI state
Aug  8 10:53:21 localhost kernel: [   23.143982] host10: rport 0004e0: Received a PRLI accept
Aug  8 10:53:21 localhost kernel: [   23.143984] host10: rport 0004e0: PRLI spp_flags = 0x0 spp_type 0x20
Aug  8 10:53:21 localhost kernel: [   23.143985] host10: rport 0004e0: Error -6 in state PRLI, retrying
Aug  8 10:53:21 localhost kernel: [   23.144434] host10: rport 0004e0: Received PRLI request while in state PRLI
Aug  8 10:53:21 localhost kernel: [   23.144445] host10: rport 0004e0: PRLI rspp type 8 active 1 passive 0
Aug  8 10:53:21 localhost kernel: [   23.749492] host10: rport 0004e0: Received RTV request
Aug  8 10:53:23 localhost kernel: [   25.204559] host10: rport 0004e0: Port timeout, state PRLI
Aug  8 10:53:23 localhost kernel: [   25.347400] host10: rport 0004e0: Port entered PRLI state from PRLI state
Aug  8 10:53:23 localhost kernel: [   25.393058] host10: rport 0004e0: Received a PRLI accept
Aug  8 10:53:23 localhost kernel: [   25.398412] host10: rport 0004e0: PRLI spp_flags = 0x21 spp_type 0x8
Aug  8 10:53:23 localhost kernel: [   25.404853] host10: rport 0004e0: Port entered RTV state from PRLI state
Aug  8 10:53:23 localhost kernel: [   25.450291] host10: rport 0004e0: Received a RTV reject
Aug  8 10:53:23 localhost kernel: [   25.455620] host10: rport 0004e0: Port is Ready
Aug  8 10:53:23 localhost kernel: [   25.460185] host10: rport 0004e0: work event 1
Aug  8 10:53:23 localhost kernel: [   25.464672] host10: rport 0004e0: No rport!
Aug  8 10:53:23 localhost kernel: [   25.469577] host10: rport 0004e0: callback ev 1
Aug  8 10:53:23 localhost kernel: [   25.474152] host10: fip: vn_rport_callback 4e0 event 1
Aug  8 10:53:30 localhost kernel: [   31.850891] host10: fip: beacon from rport 4e0

This is the error retry with a 4.16 unpatched:

Aug  8 11:33:50 localhost kernel: [ 2452.565524] host10: fip: vn_claim_notify: send reply to c76e
Aug  8 11:33:50 localhost kernel: [ 2452.571392] host10: fip: vn_add rport 00c76e new state 0
Aug  8 11:33:50 localhost kernel: [ 2452.576898] host10: fip: vn_claim_notify: send reply to c76e
Aug  8 11:33:50 localhost kernel: [ 2452.582605] host10: fip: vn_add rport 00c76e old state 0
Aug  8 11:33:57 localhost kernel: [ 2458.854081] host10: fip: beacon from rport c76e
Aug  8 11:33:57 localhost kernel: [ 2458.858622] host10: fip: beacon expired for rport c76e
Aug  8 11:33:57 localhost kernel: [ 2458.863760] host10: rport 00c76e: Login to port
Aug  8 11:33:57 localhost kernel: [ 2458.868455] host10: rport 00c76e: Entered FLOGI state from Init state
Aug  8 11:33:57 localhost kernel: [ 2458.875177] host10: fip: els_send op 7 d_id c76e
Aug  8 11:33:57 localhost kernel: [ 2458.886089] host10: fip: beacon from rport c76e
Aug  8 11:33:57 localhost kernel: [ 2458.896069] host10: rport 00c76e: Received a FLOGI accept
Aug  8 11:33:57 localhost kernel: [ 2458.901716] host10: rport 00c76e: Port entered PLOGI state from FLOGI state
Aug  8 11:33:57 localhost kernel: [ 2458.934161] host10: rport 00c76e: Received a PLOGI accept
Aug  8 11:33:57 localhost kernel: [ 2458.939651] host10: rport 00c76e: Port entered PRLI state from PLOGI state
Aug  8 11:33:57 localhost kernel: [ 2458.985427] host10: rport 00c76e: Received a PRLI reject
Aug  8 11:33:57 localhost kernel: [ 2458.990817] host10: rport 00c76e: PRLI ELS rejected, reason 5 expl 0
Aug  8 11:33:57 localhost kernel: [ 2458.997255] host10: rport 00c76e: Error 5 in state PRLI, retrying
Aug  8 11:33:59 localhost kernel: [ 2461.060167] host10: rport 00c76e: Port timeout, state PRLI
Aug  8 11:33:59 localhost kernel: [ 2461.065688] host10: rport 00c76e: Port entered PRLI state from PRLI state
Aug  8 11:33:59 localhost kernel: [ 2461.111048] host10: rport 00c76e: Received a PRLI reject
Aug  8 11:33:59 localhost kernel: [ 2461.116388] host10: rport 00c76e: PRLI ELS rejected, reason 9 expl 1e
Aug  8 11:33:59 localhost kernel: [ 2461.122927] host10: rport 00c76e: Error 5 in state PRLI, retrying
Aug  8 11:34:01 localhost kernel: [ 2463.172152] host10: rport 00c76e: Port timeout, state PRLI
Aug  8 11:34:01 localhost kernel: [ 2463.177696] host10: rport 00c76e: Port entered PRLI state from PRLI state
Aug  8 11:34:01 localhost kernel: [ 2463.224578] host10: rport 00c76e: Received a PRLI reject
Aug  8 11:34:01 localhost kernel: [ 2463.230023] host10: rport 00c76e: PRLI ELS rejected, reason 9 expl 1e
Aug  8 11:34:01 localhost kernel: [ 2463.236765] host10: rport 00c76e: Error 5 in state PRLI, retrying
Aug  8 11:34:03 localhost kernel: [ 2465.283937] host10: rport 00c76e: Port timeout, state PRLI
Aug  8 11:34:03 localhost kernel: [ 2465.289424] host10: rport 00c76e: Port entered PRLI state from PRLI state
Aug  8 11:34:03 localhost kernel: [ 2465.335150] host10: rport 00c76e: Received a PRLI reject
Aug  8 11:34:03 localhost kernel: [ 2465.340540] host10: rport 00c76e: PRLI ELS rejected, reason 9 expl 1e
Aug  8 11:34:03 localhost kernel: [ 2465.347068] host10: rport 00c76e: Error -5 in state PRLI, retries 3
Aug  8 11:34:03 localhost kernel: [ 2465.353411] host10: rport 00c76e: Port sending LOGO from PRLI state
Aug  8 11:34:03 localhost kernel: [ 2465.359760] host10: fip: els_send op 9 d_id c76e
Aug  8 11:34:05 localhost kernel: [ 2467.046011] host10: fip: beacon from rport c76e
Aug  8 11:34:05 localhost kernel: [ 2467.050549] host10: fip: beacon from rport c76e
Aug  8 11:34:09 localhost kernel: [ 2470.857393] host10: fip: vn_claim_notify: send reply to c76e
Aug  8 11:34:09 localhost kernel: [ 2470.863225] host10: fip: vn_add rport 00c76e old state 4
Aug  8 11:34:09 localhost kernel: [ 2470.868764] host10: fip: vn_claim_notify: send reply to c76e
Aug  8 11:34:09 localhost kernel: [ 2470.874463] host10: fip: vn_add rport 00c76e old state 4
Aug  8 11:34:13 localhost kernel: [ 2475.238121] host10: fip: beacon from rport c76e
Aug  8 11:34:13 localhost kernel: [ 2475.242660] host10: fip: beacon expired for rport c76e
Aug  8 11:34:13 localhost kernel: [ 2475.247804] host10: rport 00c76e: port already started

Repeat:

Aug  8 11:34:13 localhost kernel: [ 2475.253076] host10: fip: beacon from rport c76e
Aug  8 11:34:21 localhost kernel: [ 2483.429903] host10: fip: beacon from rport c76e
Aug  8 11:34:21 localhost kernel: [ 2483.434500] host10: fip: beacon from rport c76e
Aug  8 11:34:29 localhost kernel: [ 2491.621938] host10: fip: beacon from rport c76e
Aug  8 11:34:29 localhost kernel: [ 2491.626478] host10: fip: beacon from rport c76e

This part contains a memory leak after a timeout:
Aug  8 11:34:33 localhost kernel: [ 2495.433102] host10: fip: vn_claim_notify: send reply to c76e
Aug  8 11:34:33 localhost kernel: [ 2495.438842] host10: fip: vn_add rport 00c76e old state 4
Aug  8 11:34:33 localhost kernel: [ 2495.444266] host10: fip: vn_claim_notify: send reply to c76e
Aug  8 11:34:33 localhost kernel: [ 2495.450120] host10: fip: vn_add rport 00c76e old state 4
Aug  8 11:34:38 localhost kernel: [ 2499.813722] host10: fip: beacon from rport c76e
Aug  8 11:34:38 localhost kernel: [ 2499.818256] host10: fip: beacon expired for rport c76e
Aug  8 11:34:38 localhost kernel: [ 2499.823402] host10: rport 00c76e: port already started

You can find the raw kmemleak and kern.log here:
https://github.com/hardkernel/linux/issue_comments#issuecomment-411772096

And the some filtered logs a bit more up.

The logs when it all went ok:
https://github.com/hardkernel/linux/issues/360#issuecomment-411335647
Especially:
Aug  7 17:54:30 localhost kernel: [   15.540607] host10: fip: claim resp from from rport c76e - state VNMP_UP
Aug  7 17:54:30 localhost kernel: [   15.548710] host10: fip: vn_add rport 00c76e new state 0
Aug  7 17:54:30 localhost kernel: [   15.555463] host10: fip: claim resp from from rport c76e - state VNMP_UP
Aug  7 17:54:30 localhost kernel: [   15.563584] host10: fip: vn_add rport 00c76e old state 0
Aug  7 17:54:30 localhost kernel: [   15.696327] host10: fip: beacon expired for rport c76e
Aug  7 17:54:30 localhost kernel: [   15.702975] host10: rport 00c76e: Login to port
Aug  7 17:54:30 localhost kernel: [   15.707617] host10: rport 00c76e: Entered FLOGI state from Init state
Aug  7 17:54:30 localhost kernel: [   15.714224] host10: fip: els_send op 7 d_id c76e
Aug  7 17:54:30 localhost kernel: [   15.736741] host10: rport 00c76e: Received a FLOGI accept
Aug  7 17:54:30 localhost kernel: [   15.742572] host10: rport 00c76e: Port entered PLOGI state from FLOGI state
Aug  7 17:54:30 localhost kernel: [   15.749695] host10: rport 00c76e: Received a PLOGI accept
Aug  7 17:54:30 localhost kernel: [   15.749696] host10: rport 00c76e: Port entered PRLI state from PLOGI state
Aug  7 17:54:30 localhost kernel: [   15.750112] host10: rport 00c76e: Received a PRLI reject
Aug  7 17:54:30 localhost kernel: [   15.750113] host10: rport 00c76e: PRLI ELS rejected, reason 9 expl 1e
Aug  7 17:54:30 localhost kernel: [   15.750114] host10: rport 00c76e: Error 5 in state PRLI, retrying
Aug  7 17:54:30 localhost kernel: [   15.750360] host10: rport 00c76e: Received PRLI request while in state PRLI
Aug  7 17:54:30 localhost kernel: [   15.750371] host10: rport 00c76e: PRLI rspp type 8 active 1 passive 0
Aug  7 17:54:30 localhost kernel: [   15.750570] host10: rport 00c76e: Received RTV request
Aug  7 17:54:30 localhost kernel: [   17.758897] host10: rport 00c76e: Port timeout, state PRLI
Aug  7 17:54:30 localhost kernel: [   17.764559] host10: rport 00c76e: Port entered PRLI state from PRLI state
Aug  7 17:54:30 localhost kernel: [   17.812081] host10: rport 00c76e: Received a PRLI accept
Aug  7 17:54:31 localhost kernel: [   17.818809] host10: rport 00c76e: PRLI spp_flags = 0x21 spp_type 0x8
Aug  7 17:54:31 localhost kernel: [   17.826634] host10: rport 00c76e: Port entered RTV state from PRLI state
Aug  7 17:54:31 localhost kernel: [   17.878821] host10: rport 00c76e: Received a RTV accept
Aug  7 17:54:31 localhost kernel: [   17.885478] host10: rport 00c76e: Port is Ready
Aug  7 17:54:31 localhost kernel: [   17.891513] host10: rport 00c76e: work event 1
Aug  7 17:54:31 localhost kernel: [   17.897386] host10: rport 00c76e: No rport!
Aug  7 17:54:31 localhost kernel: [   17.903698] host10: rport 00c76e: callback ev 1
Aug  7 17:54:31 localhost kernel: [   17.909081] host10: fip: vn_rport_callback c76e event 1
Aug  7 17:54:34 localhost kernel: [   20.992106] host10: rport 00c76e: Received ADISC request

So in the "working" case I see that the other node (4.16 unpatched) accepts our
PRLI, if I interpret that right, and in the other case it
doesn't. Same system, different boot order though, but I guess
the boot order doesn't matter. 
When it doesn't work, it gets or sends a mutual LOGO and gets into a weird state.

Anyway: as far as I interpreted it, in the vn2vn situation alle
enodes should know all other rports to prevent a double rport
assignment. I have no experience with FC itself though, so I
don't know what the normal state is but sending a LOGO due to a
failure to set up, is that ok?
Or is the vn2vn rport maintenance on another layer?

The 4.9 kernel is still to be baked...
I expect it to yield minor kmemleaks, until I start toggling the
FCoE.

Regards,
Ard van Breemen

PS: I removed some cross spamming from the Cc:, I hope that's ok.
-- 
.signature not found



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux