Hi Nicholas, and thanks for your reply. I did realize that ERL=2 was causing troubles and I switched to ERL=0 as you suggested. Anyway after many days of spotless service (now I have switched to Linux 4.6.0), I got again this in dmesg (this time the kernel didn't crash): [Tue Jun 14 18:23:49 2016] Received CmdSN: 0x02694b50 is less than ExpCmdSN: 0x02694b52, ignoring. [Tue Jun 14 18:23:49 2016] iscsi_trx: page allocation failure: order:3, mode:0x208c020(GFP_ATOMIC|__GFP_COMP|__GFP_ZERO) [Tue Jun 14 18:23:49 2016] CPU: 1 PID: 13872 Comm: iscsi_trx Not tainted 4.6.0-gentoo #1 [Tue Jun 14 18:23:49 2016] Hardware name: Dell Inc. PowerEdge T20/0VD5HY, BIOS A01 08/13/2013 [Tue Jun 14 18:23:49 2016] 0000000000000000 ffffffff812b8d18 0000000000000000 ffff8801a4dfbc40 [Tue Jun 14 18:23:49 2016] ffffffff810d3d7d 0208c0201edf8838 fffffffffffffff8 0000000000000000 [Tue Jun 14 18:23:49 2016] ffff88021edd8800 0000000000000000 ffff88021edf8848 0000000000000008 [Tue Jun 14 18:23:49 2016] Call Trace: [Tue Jun 14 18:23:49 2016] [<ffffffff812b8d18>] ? dump_stack+0x46/0x59 [Tue Jun 14 18:23:49 2016] [<ffffffff810d3d7d>] ? warn_alloc_failed+0x117/0x137 [Tue Jun 14 18:23:49 2016] [<ffffffff810d6203>] ? __alloc_pages_nodemask+0x895/0x939 [Tue Jun 14 18:23:49 2016] [<ffffffff8106afdf>] ? print_time.part.12+0x4f/0x52 [Tue Jun 14 18:23:49 2016] [<ffffffff81106b04>] ? alloc_pages_current+0xb1/0xd3 [Tue Jun 14 18:23:49 2016] [<ffffffff810e874c>] ? kmalloc_order+0xf/0x3a [Tue Jun 14 18:23:49 2016] [<ffffffff810e89ed>] ? kmalloc_order_trace+0x19/0x8d [Tue Jun 14 18:23:49 2016] [<ffffffffa023e6b8>] ? iscsit_dump_data_payload+0x50/0x181 [iscsi_target_mod] [Tue Jun 14 18:23:49 2016] [<ffffffffa024a5aa>] ? iscsi_target_rx_thread+0xa48/0xa98 [iscsi_target_mod] [Tue Jun 14 18:23:49 2016] [<ffffffff81011684>] ? __switch_to+0x157/0x392 [Tue Jun 14 18:23:49 2016] [<ffffffff8105f3b5>] ? dequeue_task_fair+0x163/0x1da [Tue Jun 14 18:23:49 2016] [<ffffffffa0249b62>] ? iscsi_target_tx_thread+0x1a5/0x1a5 [iscsi_target_mod] [Tue Jun 14 18:23:49 2016] [<ffffffff81051a2b>] ? kthread+0x95/0x9d [Tue Jun 14 18:23:49 2016] [<ffffffff81538ed2>] ? ret_from_fork+0x22/0x40 [Tue Jun 14 18:23:49 2016] [<ffffffff81051996>] ? init_completion+0x1d/0x1d [Tue Jun 14 18:23:49 2016] Mem-Info: [Tue Jun 14 18:23:49 2016] active_anon:196 inactive_anon:326 isolated_anon:0 active_file:877143 inactive_file:893182 isolated_file:0 unevictable:3780 dirty:9262 writeback:0 unstable:0 slab_reclaimable:46676 slab_unreclaimable:33780 mapped:1708 shmem:2 pagetables:236 bounce:0 free:40125 free_pcp:418 free_cma:0 [Tue Jun 14 18:23:49 2016] Node 0 DMA free:15852kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15936kB managed:15852kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [Tue Jun 14 18:23:49 2016] lowmem_reserve[]: 0 3414 7875 7875 [Tue Jun 14 18:23:49 2016] Node 0 DMA32 free:88792kB min:28384kB low:35480kB high:42576kB active_anon:364kB inactive_anon:240kB active_file:1509984kB inactive_file:1530632kB unevictable:9528kB isolated(anon):0kB isolated(file):0kB present:3578504kB managed:3502092kB mlocked:9528kB dirty:8156kB writeback:0kB mapped:4052kB shmem:8kB slab_reclaimable:80668kB slab_unreclaimable:78324kB kernel_stack:672kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:620kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [Tue Jun 14 18:23:49 2016] lowmem_reserve[]: 0 0 4460 4460 [Tue Jun 14 18:23:49 2016] Node 0 Normal free:55856kB min:37020kB low:46272kB high:55524kB active_anon:420kB inactive_anon:1064kB active_file:1998588kB inactive_file:2042096kB unevictable:5592kB isolated(anon):0kB isolated(file):0kB present:4700160kB managed:4568028kB mlocked:5592kB dirty:28892kB writeback:0kB mapped:2780kB shmem:0kB slab_reclaimable:106036kB slab_unreclaimable:56796kB kernel_stack:2016kB pagetables:376kB unstable:0kB bounce:0kB free_pcp:1052kB local_pcp:424kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [Tue Jun 14 18:23:49 2016] lowmem_reserve[]: 0 0 0 0 [Tue Jun 14 18:23:49 2016] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15852kB [Tue Jun 14 18:23:49 2016] Node 0 DMA32: 3918*4kB (UME) 8928*8kB (U) 115*16kB (UM) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 88936kB [Tue Jun 14 18:23:49 2016] Node 0 Normal: 6115*4kB (UE) 3932*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55916kB [Tue Jun 14 18:23:49 2016] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [Tue Jun 14 18:23:49 2016] 1771862 total pagecache pages [Tue Jun 14 18:23:49 2016] 92 pages in swap cache [Tue Jun 14 18:23:49 2016] Swap cache stats: add 4926, delete 4834, find 802213/803440 [Tue Jun 14 18:23:49 2016] Free swap = 10478708kB [Tue Jun 14 18:23:49 2016] Total swap = 10485756kB [Tue Jun 14 18:23:49 2016] 2073650 pages RAM [Tue Jun 14 18:23:49 2016] 0 pages HighMem/MovableOnly [Tue Jun 14 18:23:49 2016] 52157 pages reserved [Tue Jun 14 18:23:49 2016] Unable to allocate 32768 bytes for offload buffer. [Tue Jun 14 18:48:46 2016] Received CmdSN: 0x02696f4e is less than ExpCmdSN: 0x02696f4f, ignoring. [Tue Jun 14 18:48:46 2016] iscsi_trx: page allocation failure: order:3, mode:0x208c020(GFP_ATOMIC|__GFP_COMP|__GFP_ZERO) [Tue Jun 14 18:48:46 2016] CPU: 1 PID: 14614 Comm: iscsi_trx Not tainted 4.6.0-gentoo #1 [Tue Jun 14 18:48:46 2016] Hardware name: Dell Inc. PowerEdge T20/0VD5HY, BIOS A01 08/13/2013 [Tue Jun 14 18:48:46 2016] 0000000000000000 ffffffff812b8d18 0000000000000000 ffff880098e63c40 [Tue Jun 14 18:48:46 2016] ffffffff810d3d7d 0208c0201edf8838 fffffffffffffff8 0000000000000000 [Tue Jun 14 18:48:46 2016] ffff88021edd8800 0000000000000000 ffff88021edf8848 0000000000000008 [Tue Jun 14 18:48:46 2016] Call Trace: [Tue Jun 14 18:48:46 2016] [<ffffffff812b8d18>] ? dump_stack+0x46/0x59 [Tue Jun 14 18:48:46 2016] [<ffffffff810d3d7d>] ? warn_alloc_failed+0x117/0x137 [Tue Jun 14 18:48:46 2016] [<ffffffff810d6203>] ? __alloc_pages_nodemask+0x895/0x939 [Tue Jun 14 18:48:46 2016] [<ffffffff8106afdf>] ? print_time.part.12+0x4f/0x52 [Tue Jun 14 18:48:46 2016] [<ffffffff81106b04>] ? alloc_pages_current+0xb1/0xd3 [Tue Jun 14 18:48:46 2016] [<ffffffff810e874c>] ? kmalloc_order+0xf/0x3a [Tue Jun 14 18:48:46 2016] [<ffffffff810e89ed>] ? kmalloc_order_trace+0x19/0x8d [Tue Jun 14 18:48:46 2016] [<ffffffffa023e6b8>] ? iscsit_dump_data_payload+0x50/0x181 [iscsi_target_mod] [Tue Jun 14 18:48:46 2016] [<ffffffffa024a5aa>] ? iscsi_target_rx_thread+0xa48/0xa98 [iscsi_target_mod] [Tue Jun 14 18:48:46 2016] [<ffffffff81011684>] ? __switch_to+0x157/0x392 [Tue Jun 14 18:48:46 2016] [<ffffffff8105f3b5>] ? dequeue_task_fair+0x163/0x1da [Tue Jun 14 18:48:46 2016] [<ffffffffa0249b62>] ? iscsi_target_tx_thread+0x1a5/0x1a5 [iscsi_target_mod] [Tue Jun 14 18:48:46 2016] [<ffffffff81051a2b>] ? kthread+0x95/0x9d [Tue Jun 14 18:48:46 2016] [<ffffffff81538ed2>] ? ret_from_fork+0x22/0x40 [Tue Jun 14 18:48:46 2016] [<ffffffff81051996>] ? init_completion+0x1d/0x1d [Tue Jun 14 18:48:46 2016] Mem-Info: [Tue Jun 14 18:48:46 2016] active_anon:164 inactive_anon:360 isolated_anon:0 active_file:871506 inactive_file:898134 isolated_file:32 unevictable:3780 dirty:9939 writeback:8 unstable:0 slab_reclaimable:46598 slab_unreclaimable:34865 mapped:1769 shmem:2 pagetables:236 bounce:0 free:39584 free_pcp:609 free_cma:0 [Tue Jun 14 18:48:46 2016] Node 0 DMA free:15852kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15936kB managed:15852kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [Tue Jun 14 18:48:46 2016] lowmem_reserve[]: 0 3414 7875 7875 [Tue Jun 14 18:48:46 2016] Node 0 DMA32 free:88564kB min:28384kB low:35480kB high:42576kB active_anon:236kB inactive_anon:368kB active_file:1504596kB inactive_file:1546616kB unevictable:9528kB isolated(anon):0kB isolated(file):0kB present:3578504kB managed:3502092kB mlocked:9528kB dirty:10140kB writeback:16kB mapped:4052kB shmem:8kB slab_reclaimable:80484kB slab_unreclaimable:80736kB kernel_stack:688kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:1300kB local_pcp:640kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [Tue Jun 14 18:48:46 2016] lowmem_reserve[]: 0 0 4460 4460 [Tue Jun 14 18:48:46 2016] Node 0 Normal free:54036kB min:37020kB low:46272kB high:55524kB active_anon:420kB inactive_anon:1072kB active_file:1981428kB inactive_file:2045792kB unevictable:5592kB isolated(anon):0kB isolated(file):128kB present:4700160kB managed:4568028kB mlocked:5592kB dirty:29616kB writeback:16kB mapped:3024kB shmem:0kB slab_reclaimable:105908kB slab_unreclaimable:58724kB kernel_stack:2016kB pagetables:376kB unstable:0kB bounce:0kB free_pcp:1140kB local_pcp:428kB free_cma:0kB writeback_tmp:0kB pages_scanned:128 all_unreclaimable? no [Tue Jun 14 18:48:46 2016] lowmem_reserve[]: 0 0 0 0 [Tue Jun 14 18:48:46 2016] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15852kB [Tue Jun 14 18:48:46 2016] Node 0 DMA32: 8577*4kB (UME) 5795*8kB (UM) 496*16kB (UM) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 88604kB [Tue Jun 14 18:48:46 2016] Node 0 Normal: 9263*4kB (UE) 2116*8kB (U) 13*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54188kB [Tue Jun 14 18:48:46 2016] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [Tue Jun 14 18:48:46 2016] 1771195 total pagecache pages [Tue Jun 14 18:48:46 2016] 92 pages in swap cache [Tue Jun 14 18:48:46 2016] Swap cache stats: add 4928, delete 4836, find 802888/804117 [Tue Jun 14 18:48:46 2016] Free swap = 10478716kB [Tue Jun 14 18:48:46 2016] Total swap = 10485756kB [Tue Jun 14 18:48:46 2016] 2073650 pages RAM [Tue Jun 14 18:48:46 2016] 0 pages HighMem/MovableOnly [Tue Jun 14 18:48:46 2016] 52157 pages reserved [Tue Jun 14 18:48:46 2016] Unable to allocate 32768 bytes for offload buffer. N.B. I have these settings in sysctl.conf (as suggested somewhere): ------ vm.min_free_kbytes = 65536 net.core.rmem_max = 1048576 net.core.rmem_default = 1048576 net.core.wmem_max = 1048576 net.core.wmem_default = 1048576 net.ipv4.tcp_mem = 1048576 1048576 1048576 net.ipv4.tcp_rmem = 1048576 1048576 1048576 net.ipv4.tcp_wmem = 1048576 1048576 ------ I hope this report turns out useful, Thanks for your work, Edoardo On Wed, May 18, 2016 at 9:34 AM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote: > Hi Edoardo, > > Apologies for the delayed follow up on your bug report. > > Comments inline below. > > On Wed, 2016-05-04 at 10:38 +0200, Edoardo wrote: >> Hi all, >> I'm having troubles in my iSCSI target server after the update to >> linux-4.5.2. I've always had some trouble, but not this bad. >> This time the kernel simply crashes printing out “fixing recursive >> fault but reboot is needed”, but reboot is actually the only thing I >> can do. >> Fortunately I was able to catch the dmesg output thanks to a remote >> syslog server. >> Can you help me sort this out? >> I'm also testing on btrfs filesystem, so some troubles may come from there. >> >> I attached my saved targetcli configuration, and paste the info and >> the dmesg output >> >> uname -a : >> Linux gentoo-SMB1 4.5.2-gentoo #2 SMP Tue Apr 26 11:36:10 CEST 2016 >> x86_64 Intel(R) Pentium(R) CPU G3220 @ 3.00GHz GenuineIntel GNU/Linux >> > > In your lio_start.sh, all /sys/kernel/config/target/iscsi/$IQN/$TPGT/ > endpoints are changing parameter defaults ErrorRecoveryLevel=0 to > ErrorRecoveryLevel=2 for the two active iscsi-target exports. > > The reason we default to ERL=0 is because MSFT initiators have long had > problems not following the ERL=2's connection recovery state machine in > RFC-3720, resulting in hung scsi miniport I/O and other MSFT host side > issues. > > Of course, the type of memory allocation failure you've observed below > should not be triggering a target OOPsen, but for getting stable > MSFT iSCSI host setup you really need to be using ERL=0 defaults for all > exports. > >> [ 1154.103989] ignoring deprecated emulate_fua_read attribute >> [ 1154.104021] ignoring deprecated emulate_dpo attribute >> [ 1709.899686] Unable to locate ITT: 0xad891600 on CID: 1, dumping payload >> [ 1709.899750] Unable to locate ITT: 0xae891600 on CID: 1, dumping payload >> [ 1709.899792] Unable to locate ITT: 0xaf891600 on CID: 1, dumping payload >> [ 1709.899841] Unable to locate ITT: 0xb0891600 on CID: 1, dumping payload >> [ 1709.899856] Unable to locate ITT: 0xb1891600 on CID: 1, dumping payload >> [ 1710.138608] Unable to locate ITT: 0xb2891600 on CID: 1, dumping payload >> [ 1714.873446] Unable to locate ITT: 0x5b8a1600 on CID: 1, dumping payload >> [ 1714.873513] Unable to locate ITT: 0x5c8a1600 on CID: 1, dumping payload >> [ 1714.873644] Unable to locate ITT: 0x5d8a1600 on CID: 1, dumping payload >> [ 1714.873689] Unable to locate ITT: 0x5e8a1600 on CID: 1, dumping payload >> [ 1714.876817] Unable to locate ITT: 0x5f8a1600 on CID: 1, dumping payload >> [ 1752.823610] Unable to locate ITT: 0xfd8e1600 on CID: 1, dumping payload >> [ 1774.958443] Unable to locate ITT: 0x07911600 on CID: 1, dumping payload >> [ 1774.960190] Unable to locate ITT: 0x08911600 on CID: 1, dumping payload >> [ 1774.961896] Unable to locate ITT: 0x09911600 on CID: 1, dumping payload >> [ 1774.963396] Unable to locate ITT: 0x0a911600 on CID: 1, dumping payload >> [ 1774.965143] Unable to locate ITT: 0x0b911600 on CID: 1, dumping payload >> [ 1774.966932] Unable to locate ITT: 0x0c911600 on CID: 1, dumping payload >> [ 1774.968395] Unable to locate ITT: 0x0d911600 on CID: 1, dumping payload >> [ 1864.857610] Unable to locate ITT: 0x54951600 on CID: 1, dumping payload >> [ 1864.859108] Unable to locate ITT: 0x55951600 on CID: 1, dumping payload >> [ 1868.897251] Unable to locate ITT: 0x73951600 on CID: 1, dumping payload >> [ 1872.249432] Unable to locate ITT: 0x90951600 on CID: 1, dumping payload >> [ 1872.249436] iscsi_trx: page allocation failure: order:3, mode:0x208c020 >> [ 1872.249439] CPU: 0 PID: 5031 Comm: iscsi_trx Not tainted 4.5.2-gentoo #2 >> [ 1872.249440] Hardware name: Dell Inc. PowerEdge T20/0VD5HY, BIOS A01 08/13/2013 > > <SNIP> > >> [ 1872.249556] Unable to allocate 32768 bytes for offload buffer. >> [ 1872.249559] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249560] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249561] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249563] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249564] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249565] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249566] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249567] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249568] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249569] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249570] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249571] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249573] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249574] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249576] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249577] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249578] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249579] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249583] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249584] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249585] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249586] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249587] NopOUT Flag's, Left Most Bit not set, protocol error. >> [ 1872.249588] Got unknown iSCSI OpCode: 0x5b >> [ 1872.249590] Unable to recover from unknown opcode while OFMarker=No, closing iSCSI connection. >> [ 1903.106918] Unable to locate ITT: 0xea961600 on CID: 1, dumping payload >> [ 1903.106974] Unable to locate ITT: 0xeb961600 on CID: 1, dumping payload >> [ 1903.107017] Unable to locate ITT: 0xec961600 on CID: 1, dumping payload >> [ 1903.107054] Unable to locate ITT: 0xed961600 on CID: 1, dumping payload >> [ 1915.539126] Unable to locate ITT: 0x90971600 on CID: 1, dumping payload >> [ 1915.539132] iscsi_trx: page allocation failure: order:3, mode:0x208c020 >> [ 1915.539135] CPU: 0 PID: 5284 Comm: iscsi_trx Not tainted 4.5.2-gentoo #2 >> [ 1915.539136] Hardware name: Dell Inc. PowerEdge T20/0VD5HY, BIOS A01 08/13/2013 > > <SNIP> > >> [ 1915.539269] Unable to allocate 32768 bytes for offload buffer. > > It's strange that smallish order:3 memory allocations begin to fail this > early.. > >> [ 1915.539271] Got unknown iSCSI OpCode: 0x33 >> [ 1915.539272] Unable to recover from unknown opcode while OFMarker=No, closing iSCSI connection. >> [ 1915.547337] BUG: unable to handle kernel paging request at ffffc900017e4100 >> [ 1915.547478] IP: [<ffffffffa03cfbf4>] iscsit_free_connection_recovery_entires+0x1d9/0x278 [iscsi_target_mod] >> [ 1915.547592] PGD 21608a067 PUD 21608b067 PMD d9520067 PTE 0 >> [ 1915.547805] Oops: 0000 [#1] SMP >> [ 1915.547933] Modules linked in: tcm_loop iscsi_target_mod >> target_core_pscsi target_core_file target_core_iblock target_core_mod >> kvm_intel kvm irqbypass crc32c_intel e1000e [last unloaded: >> target_core_mod] >> [ 1915.548539] CPU: 1 PID: 4150 Comm: kworker/1:1 Not tainted 4.5.2-gentoo #2 >> [ 1915.548597] Hardware name: Dell Inc. PowerEdge T20/0VD5HY, BIOS A01 08/13/2013 >> [ 1915.548662] Workqueue: events iscsi_target_do_login_rx [iscsi_target_mod] >> [ 1915.548759] task: ffff8801a6d1bf00 ti: ffff8801d95d4000 task.ti: ffff8801d95d4000 >> [ 1915.548817] RIP: 0010:[<ffffffffa03cfbf4>] [<ffffffffa03cfbf4>] iscsit_free_connection_recovery_entires+0x1d9/0x278 [iscsi_target_mod] >> [ 1915.548933] RSP: 0018:ffff8801d95d7c98 EFLAGS: 00010246 >> [ 1915.548987] RAX: ffff8800b68ad858 RBX: ffff8801c96e3000 RCX: ffffc900017e4100 >> [ 1915.549043] RDX: 0000000000000001 RSI: 00000001800c000b RDI: ffff8800b68ad868 >> [ 1915.549100] RBP: ffff8801c96e3070 R08: 0000000000000001 R09: ffffffffa03db052 >> [ 1915.549197] R10: ffffea0000b54000 R11: 0000000000100001 R12: ffff8801c96e3110 >> [ 1915.549294] R13: ffff8800b68ad868 R14: ffff8800b68ad840 R15: ffffc900017e3f00 >> [ 1915.549392] FS: 0000000000000000(0000) GS:ffff88021eb00000(0000) knlGS:0000000000000000 >> [ 1915.549531] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 1915.549625] CR2: ffffc900017e4100 CR3: 00000000c0ad1000 CR4: 00000000000406a0 >> [ 1915.549721] Stack: >> [ 1915.549806] ffff8800b68ad858 ffff8801c96e30f8 ffff880211a96000 ffff8801c96e3000 >> [ 1915.550103] ffff880061b4f050 ffff880061b4f000 ffff880061b4f050 ffff880214ec99c0 >> [ 1915.550398] ffff880211a96000 ffffffffa03db06a ffffffffa038845c ffff88002d500a74 >> [ 1915.550695] Call Trace: >> [ 1915.550788] [<ffffffffa03db06a>] ? iscsit_close_session+0xe5/0x164 [iscsi_target_mod] >> [ 1915.550936] [<ffffffffa038845c>] ? atomic_dec_mb+0x4/0x4 [target_core_mod] >> [ 1915.551039] [<ffffffffa0388bf5>] ? kref_put+0x2e/0x36 [target_core_mod] >> [ 1915.551142] [<ffffffffa03d0676>] ? iscsi_check_for_session_reinstatement+0x1bf/0x1d0 [iscsi_target_mod] >> [ 1915.551290] [<ffffffffa03d24d4>] ? iscsi_target_do_login+0x2f9/0x4e8 [iscsi_target_mod] >> [ 1915.551434] [<ffffffffa03d315f>] ? iscsi_target_do_login_rx+0x165/0x1e9 [iscsi_target_mod] >> [ 1915.551578] [<ffffffffa03d1e67>] ? iscsi_target_restore_sock_callbacks+0x8a/0x8a [iscsi_target_mod] >> [ 1915.551722] [<ffffffff8104c608>] ? process_one_work+0x19c/0x2b9 >> [ 1915.551818] [<ffffffff8104ca5d>] ? worker_thread+0x1d9/0x2ad >> [ 1915.551912] [<ffffffff8104c884>] ? cancel_delayed_work_sync+0xa/0xa >> [ 1915.552010] [<ffffffff810507b7>] ? kthread+0x95/0x9d >> [ 1915.552104] [<ffffffff81050722>] ? kthread_parkme+0x16/0x16 >> [ 1915.552204] [<ffffffff81538d5f>] ? ret_from_fork+0x3f/0x70 >> [ 1915.552293] [<ffffffff81050722>] ? kthread_parkme+0x16/0x16 >> [ 1915.552382] Code: 86 90 00 00 00 c6 83 10 01 00 00 00 4d 8d 6e 28 >> 4c 89 ef e8 2e 8c 16 e1 49 8b 4e 18 49 8d 46 18 48 89 04 24 4c 8d b9 >> 00 fe ff ff <48> 8b 09 48 81 e9 00 02 00 00 49 8d bf 00 02 00 00 48 3b >> 3c 24 >> [ 1915.555105] RIP [<ffffffffa03cfbf4>] iscsit_free_connection_recovery_entires+0x1d9/0x278 [iscsi_target_mod] >> [ 1915.555283] RSP <ffff8801d95d7c98> >> [ 1915.555367] CR2: ffffc900017e4100 >> [ 1915.555451] ---[ end trace 405a6f5266a8bf99 ]--- >> > > Ok, so after dumping repeated TCP payloads and two memory allocation > failures, iscsi login eventually hits a kernel paging OOPs after > subsequent ERL=2 connection login failure occurs. > > I still need to ponder the scenario for a proper bug-fix might work, but > the issue itself is ERL=2 specific and does not effect ERL=0 operation. > > So as a work-around, go ahead and change to ERL=0 in your lio_start.sh > for both IQN+TargetPortalGroupTag endpoints. Note you'll need to force > reconnect all MSFT sessions after changing this configfs attribute for > the new values to take effect. > >> >> Thanks for the help, >> -- >> Edoardo Liverani >> -- > > Thanks for reporting! > -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html