Thanks Mike, will test right away with 4.1.0-rc2. Just got another crash, ceph did not seem to freeze at all on this one: May 4 22:23:53 roc-4r-scd212 kernel: [20499.357214] BUG: Bad page state in process LIOLogicalUnit pfn:36a84 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357247] page:ffffea0000daa100 count:-1 mapcount:0 mapping: (null) index:0x0 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357292] flags: 0x1ffff0000000000() May 4 22:23:53 roc-4r-scd212 kernel: [20499.357318] page dumped because: nonzero _count May 4 22:23:53 roc-4r-scd212 kernel: [20499.357343] Modules linked in: target_core_user uio rbd libceph libcrc32c iscsi_target_mod target_core_file target_core_pscsi target_core_iblock target_core_mod configfs xt_multiport iptable_filter ip_tables x_tables enhanceio_rand(OE) enhanceio_lru(OE) enhanceio_fifo(OE) enhanceio(OE) ipmi_devintf ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev sb_edac edac_core lpc_ich mei_me mei ioatdma ipmi_si ipmi_msghandler 8250_fintek wmi mac_hid 8021q garp mrp stp llc bonding lp nfsd parport auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache mlx4_en vxlan ip6_udp_tunnel udp_tunnel hid_generic ahci libahci igb usbhid hid i2c_algo_bit mpt2sas dca mlx4_core ptp raid_class scsi_transport_sas pps_core May 4 22:23:53 roc-4r-scd212 kernel: [20499.357408] CPU: 17 PID: 360893 Comm: LIOLogicalUnit Tainted: G C OE 3.19.4-031904-generic #201504131440 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357409] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357411] ffffffff81acc6c2 ffff88105be1ba50 ffffffff817c6cd7 0000000000000007 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357415] ffffea0000daa100 ffff88105be1ba80 ffffffff811816c6 0000000000000030 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357418] ffffea0000daa100 0000000000000000 00000000002284d0 ffff88105be1ba90 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357421] Call Trace: May 4 22:23:53 roc-4r-scd212 kernel: [20499.357429] [<ffffffff817c6cd7>] dump_stack+0x45/0x57 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357434] [<ffffffff811816c6>] bad_page.part.58+0xc6/0x110 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357437] [<ffffffff81181728>] bad_page+0x18/0x30 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357440] [<ffffffff81181aac>] prep_new_page+0x1bc/0x1d0 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357443] [<ffffffff811859d0>] get_page_from_freelist+0x420/0x6f0 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357447] [<ffffffff81185e11>] __alloc_pages_nodemask+0x171/0xaf0 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357452] [<ffffffff811d7cdd>] ? kmem_cache_alloc+0x19d/0x210 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357455] [<ffffffff81185e11>] ? __alloc_pages_nodemask+0x171/0xaf0 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357459] [<ffffffff811cd71c>] alloc_pages_current+0x9c/0x110 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357462] [<ffffffff8118143e>] __get_free_pages+0xe/0x40 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357466] [<ffffffff81069a81>] pgd_alloc+0x21/0x1f0 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357470] [<ffffffff810741b4>] mm_init+0x164/0x1d0 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357475] [<ffffffff817ba6d0>] dup_mm+0x8b/0x125 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357477] [<ffffffff817ba830>] copy_mm+0xc6/0xe9 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357481] [<ffffffff810755e1>] copy_process.part.29+0x681/0xea0 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357485] [<ffffffff811b2351>] ? vma_link+0xd1/0xe0 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357488] [<ffffffff81075e80>] copy_process+0x80/0x90 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357491] [<ffffffff81075fa2>] do_fork+0x62/0x280 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357495] [<ffffffff8108301f>] ? recalc_sigpending+0x1f/0x60 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357499] [<ffffffff81083a47>] ? __set_task_blocked+0x37/0x80 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357502] [<ffffffff81076246>] SyS_clone+0x16/0x20 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357506] [<ffffffff817d3b19>] stub_clone+0x69/0x90 May 4 22:23:53 roc-4r-scd212 kernel: [20499.357509] [<ffffffff817d37cd>] ? system_call_fastpath+0x16/0x1b May 4 22:23:53 roc-4r-scd212 kernel: [20499.357511] Disabling lock debugging due to kernel taint May 4 22:23:54 roc-4r-scd212 kernel: [20499.592770] ABORT_TASK: Found referenced iSCSI task_tag: 74711 May 4 22:23:54 roc-4r-scd212 kernel: [20499.592775] Unexpected ret: -32 send data 48 May 4 22:23:54 roc-4r-scd212 kernel: [20499.592806] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 74711 May 4 22:23:56 roc-4r-scd212 kernel: [20502.043663] ABORT_TASK: Found referenced iSCSI task_tag: 107196 May 4 22:23:56 roc-4r-scd212 kernel: [20502.043671] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 107196 May 4 22:23:56 roc-4r-scd212 kernel: [20502.043674] Unable to locate ITT: 0x0001a2bc on CID: 0 May 4 22:23:56 roc-4r-scd212 kernel: [20502.043674] Unable to locate RefTaskTag: 0x0001a2bc on CID: 0. May 4 22:23:56 roc-4r-scd212 kernel: [20502.043732] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 107197 May 4 22:23:56 roc-4r-scd212 kernel: [20502.043736] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 107197 May 4 22:24:10 roc-4r-scd212 kernel: [20515.628746] ABORT_TASK: Found referenced iSCSI task_tag: 107206 May 4 22:24:10 roc-4r-scd212 kernel: [20515.628750] ABORT_TASK: ref_tag: 107206 already complete, skipping May 4 22:24:10 roc-4r-scd212 kernel: [20515.628751] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 107206 May 4 22:24:15 roc-4r-scd212 kernel: [20520.596113] general protection fault: 0000 [#1] SMP May 4 22:24:15 roc-4r-scd212 kernel: [20520.596148] Modules linked in: target_core_user uio rbd libceph libcrc32c iscsi_target_mod target_core_file target_core_pscsi target_core_iblock target_core_mod configfs xt_multiport iptable_filter On Mon, May 4, 2015 at 1:47 PM, Mike Christie <mchristi@xxxxxxxxxx> wrote: > On 05/04/2015 12:29 PM, Mike Christie wrote: >> On 05/04/2015 11:59 AM, Mike Christie wrote: >>> On 04/30/2015 01:47 AM, Nicholas A. Bellinger wrote: >>>> AFAICT from Robert + Alex's log this is the same type of scenario, and >>>> I'm pretty sure I was hitting the same login timeout handler back then, >>>> and was able to survive at least with iblock + scsi_debug backend. >>>> >>>> Give that ceph and enhancedio backend are involved, it's not completely >>>> clear yet if this is a target specific issue or not.. >>>> >>>> Mike, what's the setup your able to reproduce with..? >>> >>> For the LIO crash part I was using scsi_debug and iblock. I just do a dd >>> to the fc/iscsi device exported by LIO. Then on the LIO box I do >>> >> >> Scratch the FC part above. For FC I thought I was hitting a crash in the >> abort related code when doing the same test. >> > > Actually, you can maybe ignore everything about the LIO crash. I now > tested iscsi and fc with 4.1.0-rc2 and they both work ok for me now. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html