On 12/7/18 4:25 PM, Ross Lagerwall wrote:
Hi Hannes,
Commit bbc0f8bd88ab ("scsi: libfc: Add WARN_ON() when deleting rports")
added a warning whose intent was to check whether the rport was still
linked into the peer list. It doesn't work as intended and I
consistently see messages like the following:
[ 66.157471] host1: rport fffcd1: work delete
[ 66.157518] WARNING: CPU: 1 PID: 151 at
drivers/scsi/libfc/fc_rport.c:187 fc_rport_destroy+0x29/0x30 [libfc]
[ 66.157519] Modules linked in: nfsv3 nfs_acl nfs lockd grace fscache
bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc ebtable_broute
ebtables bridge 8021q garp mrp stp llc ipt_REJECT nf_reject_ipv4
xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 iptable_filter dm_multipath sunrpc sb_edac
intel_powerclamp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc
aesni_intel aes_x86_64 crypto_simd dm_mod cryptd lpc_ich glue_helper
i2c_i801 psmouse hpilo ipmi_si sg intel_rapl_perf ipmi_devintf
ipmi_msghandler hed acpi_power_meter xen_wdt ip_tables x_tables sd_mod
uhci_hcd serio_raw xhci_pci ehci_pci ehci_hcd xhci_hcd hpsa bnx2x
scsi_transport_sas mdio libcrc32c scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc
scsi_dh_alua scsi_mod ipv6 crc_ccitt
[ 66.157599] CPU: 1 PID: 151 Comm: kworker/u32:1 Not tainted 4.19.0+0 #1
[ 66.157601] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016
[ 66.157608] Workqueue: fc_rport_eq fc_rport_work [libfc]
[ 66.157616] RIP: e030:fc_rport_destroy+0x29/0x30 [libfc]
[ 66.157619] Code: 00 0f 1f 44 00 00 48 8b 87 c0 00 00 00 48 8d 97 c0
00 00 00 48 39 c2 75 11 48 81 c7 f8 00 00 00 be 08 01 00 00 e9 27 2b bb
c0 <0f> 0b eb eb 0f 1f 00 0f 1f 44 00 00 48 8b 3d 0c 4f 01 00 e9 9f 77
[ 66.157621] RSP: e02b:ffffc900405cfe08 EFLAGS: 00010287
[ 66.157624] RAX: ffff880190ca96d0 RBX: ffff88016b734800 RCX:
0000000000000000
[ 66.157625] RDX: ffff88016b7348d0 RSI: 0000000000000001 RDI:
ffff88016b734810
[ 66.157627] RBP: ffff88016b734848 R08: 0000000000000000 R09:
000000000000095b
[ 66.157629] R10: 0000000000000004 R11: 0000000000000000 R12:
ffff88016b7fb7e8
[ 66.157630] R13: ffff8801904de000 R14: ffff88016b7348e0 R15:
ffff88016b734810
[ 66.157654] FS: 0000000000000000(0000) GS:ffff880193640000(0000)
knlGS:0000000000000000
[ 66.157656] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 66.157658] CR2: 00007f2aac406010 CR3: 000000018b0d6000 CR4:
0000000000042660
[ 66.157669] Call Trace:
[ 66.157681] fc_rport_work+0x1e3/0x740 [libfc]
[ 66.157693] process_one_work+0x165/0x370
[ 66.157698] worker_thread+0x49/0x3e0
[ 66.157704] kthread+0xf8/0x130
[ 66.157708] ? rescuer_thread+0x310/0x310
[ 66.157712] ? kthread_bind+0x10/0x10
[ 66.157719] ret_from_fork+0x35/0x40
[ 66.157724] ---[ end trace 78306fa56e2fdf61 ]---
This happens erroneously for two reasons:
1) If the rport is never linked into the peer list it will not be
considered empty since the list_head is never initialized.
2) If the rport is deleted from the peer list using list_del_rcu(), then
the list_head is in an undefined state and it is not considered empty.
I'm not sure how to check that the rport has been removed from the peer
list without iterating through the list. Can this check either be
removed or fixed?
Hmm. Guess you are right; one could trying to do tricks with call_rcu()
and move the WARN_ON() in there, but this doesn't relieve the problem
that we can't re-init the list head after list_del_rcu(), as this would
defeat the very purpose of using RCU lists in the first place.
Oh well, guess we have to revert it.
Cheers,
Hannes