Good morning - even after the upgrade to 4.1.-rc2 I got the crash this AM: May 5 04:36:14 roc-4r-scd214 kernel: [15004.439254] BUG: Bad page state in process LIOTarget pfn:104a1aa May 5 04:36:14 roc-4r-scd214 kernel: [15004.439289] page:ffffea0041286a80 count:-1 mapcount:0 mapping: (null) index:0x0 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439335] flags: 0x6ffff0000000000() May 5 04:36:14 roc-4r-scd214 kernel: [15004.439362] page dumped because: nonzero _count May 5 04:36:14 roc-4r-scd214 kernel: [15004.439387] Modules linked in: rbd libceph libcrc32c iscsi_target_mod target_core_file target_core_pscsi target_core_iblock target_core_mod configfs xt_multiport iptable_filter ip_tables x_tables enhanceio_rand(OE) enhanceio_lru(OE) enhanceio_fifo(OE) enhanceio(OE) ipmi_devintf ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac edac_core joydev mei_me lpc_ich mei nfsd ioatdma ses auth_rpcgss enclosure nfs_acl ipmi_si nfs 8250_fintek ipmi_msghandler 8021q lockd garp mrp grace stp llc sunrpc bonding wmi mac_hid fscache lp parport mlx4_en vxlan ip6_udp_tunnel udp_tunnel hid_generic usbhid igb hid mpt2sas i2c_algo_bit dca ahci ptp raid_class mlx4_core libahci scsi_transport_sas pps_core May 5 04:36:14 roc-4r-scd214 kernel: [15004.439456] CPU: 9 PID: 242884 Comm: LIOTarget Tainted: G C OE 4.1.0-040100rc2-generic #201505032335 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439458] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439460] ffffffff81cc6ef1 ffff88085a023ad8 ffffffff817f6e5a 0000000000000007 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439463] ffffea0041286a80 ffff88085a023b08 ffffffff8118f6f6 ffff880843308a10 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439466] ffffea0041286a80 0000000000000000 00000000002284d0 ffff88085a023b58 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439469] Call Trace: May 5 04:36:14 roc-4r-scd214 kernel: [15004.439481] [<ffffffff817f6e5a>] dump_stack+0x45/0x57 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439487] [<ffffffff8118f6f6>] bad_page.part.70+0xc6/0x110 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439490] [<ffffffff8118f924>] prep_new_page+0x1e4/0x1f0 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439494] [<ffffffff81193aed>] get_page_from_freelist+0x2bd/0x720 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439497] [<ffffffff811940df>] __alloc_pages_nodemask+0x18f/0x9c0 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439504] [<ffffffff811f5de9>] ? memcg_check_events+0x29/0x50 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439508] [<ffffffff81199cab>] ? lru_cache_add_active_or_unevictable+0x2b/0x90 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439514] [<ffffffff811db51c>] alloc_pages_current+0x9c/0x110 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439520] [<ffffffff8106fa9b>] pte_alloc_one+0x1b/0x50 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439526] [<ffffffff811b92b2>] __pte_alloc+0x32/0x180 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439530] [<ffffffff811bc342>] __handle_mm_fault+0x342/0x360 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439533] [<ffffffff811bc412>] handle_mm_fault+0xb2/0x160 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439537] [<ffffffff8106a610>] __do_page_fault+0x190/0x470 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439540] [<ffffffff8106aa77>] do_page_fault+0x37/0x90 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439544] [<ffffffff81805b58>] page_fault+0x28/0x30 May 5 04:36:14 roc-4r-scd214 kernel: [15004.439547] Disabling lock debugging due to kernel taint May 5 04:36:19 roc-4r-scd214 kernel: [15007.409541] general protection fault: 0000 [#1] SMP May 5 04:36:19 roc-4r-scd214 kernel: [15007.409575] Modules linked in: rbd libceph libcrc32c iscsi_target_mod target_core_file target_core_pscsi target_core_iblock target_core_mod configfs xt_multiport iptable_filter ip_tables x_tables enhanceio_rand(OE) enhanceio_lru(OE) enhanceio_fifo(OE) enhanceio(OE) ipmi_devintf ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac edac_core joydev mei_me lpc_ich mei nfsd ioatdma ses auth_rpcgss enclosure nfs_acl ipmi_si nfs 8250_fintek ipmi_msghandler 8021q lockd garp mrp grace stp llc sunrpc bonding wmi mac_hid fscache lp parport mlx4_en vxlan ip6_udp_tunnel udp_tunnel hid_generic usbhid igb hid mpt2sas i2c_algo_bit dca ahci ptp raid_class mlx4_core libahci scsi_transport_sas pps_core May 5 04:36:19 roc-4r-scd214 kernel: [15007.410074] CPU: 9 PID: 242920 Comm: la On Mon, May 4, 2015 at 10:54 PM, Robert Wood <rwood@xxxxxxxxxxxxxxxxxxx> wrote: > Thanks Mike, will test right away with 4.1.0-rc2. Just got another > crash, ceph did not seem to freeze at all on this one: > > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357214] BUG: Bad page > state in process LIOLogicalUnit pfn:36a84 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357247] > page:ffffea0000daa100 count:-1 mapcount:0 mapping: (null) > index:0x0 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357292] flags: 0x1ffff0000000000() > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357318] page dumped > because: nonzero _count > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357343] Modules linked > in: target_core_user uio rbd libceph libcrc32c iscsi_target_mod > target_core_file target_core_pscsi target_core_iblock target_core_mod > configfs xt_multiport iptable_filter ip_tables x_tables > enhanceio_rand(OE) enhanceio_lru(OE) enhanceio_fifo(OE) enhanceio(OE) > ipmi_devintf ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp > kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel > aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev sb_edac > edac_core lpc_ich mei_me mei ioatdma ipmi_si ipmi_msghandler > 8250_fintek wmi mac_hid 8021q garp mrp stp llc bonding lp nfsd parport > auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache mlx4_en vxlan > ip6_udp_tunnel udp_tunnel hid_generic ahci libahci igb usbhid hid > i2c_algo_bit mpt2sas dca mlx4_core ptp raid_class scsi_transport_sas > pps_core > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357408] CPU: 17 PID: > 360893 Comm: LIOLogicalUnit Tainted: G C OE > 3.19.4-031904-generic #201504131440 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357409] Hardware name: > Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a > 12/05/2013 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357411] ffffffff81acc6c2 > ffff88105be1ba50 ffffffff817c6cd7 0000000000000007 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357415] ffffea0000daa100 > ffff88105be1ba80 ffffffff811816c6 0000000000000030 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357418] ffffea0000daa100 > 0000000000000000 00000000002284d0 ffff88105be1ba90 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357421] Call Trace: > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357429] > [<ffffffff817c6cd7>] dump_stack+0x45/0x57 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357434] > [<ffffffff811816c6>] bad_page.part.58+0xc6/0x110 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357437] > [<ffffffff81181728>] bad_page+0x18/0x30 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357440] > [<ffffffff81181aac>] prep_new_page+0x1bc/0x1d0 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357443] > [<ffffffff811859d0>] get_page_from_freelist+0x420/0x6f0 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357447] > [<ffffffff81185e11>] __alloc_pages_nodemask+0x171/0xaf0 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357452] > [<ffffffff811d7cdd>] ? kmem_cache_alloc+0x19d/0x210 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357455] > [<ffffffff81185e11>] ? __alloc_pages_nodemask+0x171/0xaf0 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357459] > [<ffffffff811cd71c>] alloc_pages_current+0x9c/0x110 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357462] > [<ffffffff8118143e>] __get_free_pages+0xe/0x40 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357466] > [<ffffffff81069a81>] pgd_alloc+0x21/0x1f0 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357470] > [<ffffffff810741b4>] mm_init+0x164/0x1d0 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357475] > [<ffffffff817ba6d0>] dup_mm+0x8b/0x125 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357477] > [<ffffffff817ba830>] copy_mm+0xc6/0xe9 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357481] > [<ffffffff810755e1>] copy_process.part.29+0x681/0xea0 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357485] > [<ffffffff811b2351>] ? vma_link+0xd1/0xe0 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357488] > [<ffffffff81075e80>] copy_process+0x80/0x90 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357491] > [<ffffffff81075fa2>] do_fork+0x62/0x280 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357495] > [<ffffffff8108301f>] ? recalc_sigpending+0x1f/0x60 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357499] > [<ffffffff81083a47>] ? __set_task_blocked+0x37/0x80 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357502] > [<ffffffff81076246>] SyS_clone+0x16/0x20 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357506] > [<ffffffff817d3b19>] stub_clone+0x69/0x90 > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357509] > [<ffffffff817d37cd>] ? system_call_fastpath+0x16/0x1b > May 4 22:23:53 roc-4r-scd212 kernel: [20499.357511] Disabling lock > debugging due to kernel taint > May 4 22:23:54 roc-4r-scd212 kernel: [20499.592770] ABORT_TASK: Found > referenced iSCSI task_tag: 74711 > May 4 22:23:54 roc-4r-scd212 kernel: [20499.592775] Unexpected ret: > -32 send data 48 > May 4 22:23:54 roc-4r-scd212 kernel: [20499.592806] ABORT_TASK: > Sending TMR_FUNCTION_COMPLETE for ref_tag: 74711 > May 4 22:23:56 roc-4r-scd212 kernel: [20502.043663] ABORT_TASK: Found > referenced iSCSI task_tag: 107196 > May 4 22:23:56 roc-4r-scd212 kernel: [20502.043671] ABORT_TASK: > Sending TMR_FUNCTION_COMPLETE for ref_tag: 107196 > May 4 22:23:56 roc-4r-scd212 kernel: [20502.043674] Unable to locate > ITT: 0x0001a2bc on CID: 0 > May 4 22:23:56 roc-4r-scd212 kernel: [20502.043674] Unable to locate > RefTaskTag: 0x0001a2bc on CID: 0. > May 4 22:23:56 roc-4r-scd212 kernel: [20502.043732] ABORT_TASK: > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 107197 > May 4 22:23:56 roc-4r-scd212 kernel: [20502.043736] ABORT_TASK: > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 107197 > May 4 22:24:10 roc-4r-scd212 kernel: [20515.628746] ABORT_TASK: Found > referenced iSCSI task_tag: 107206 > May 4 22:24:10 roc-4r-scd212 kernel: [20515.628750] ABORT_TASK: > ref_tag: 107206 already complete, skipping > May 4 22:24:10 roc-4r-scd212 kernel: [20515.628751] ABORT_TASK: > Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 107206 > May 4 22:24:15 roc-4r-scd212 kernel: [20520.596113] general > protection fault: 0000 [#1] SMP > May 4 22:24:15 roc-4r-scd212 kernel: [20520.596148] Modules linked > in: target_core_user uio rbd libceph libcrc32c iscsi_target_mod > target_core_file target_core_pscsi target_core_iblock target_core_mod > configfs xt_multiport iptable_filter > > On Mon, May 4, 2015 at 1:47 PM, Mike Christie <mchristi@xxxxxxxxxx> wrote: >> On 05/04/2015 12:29 PM, Mike Christie wrote: >>> On 05/04/2015 11:59 AM, Mike Christie wrote: >>>> On 04/30/2015 01:47 AM, Nicholas A. Bellinger wrote: >>>>> AFAICT from Robert + Alex's log this is the same type of scenario, and >>>>> I'm pretty sure I was hitting the same login timeout handler back then, >>>>> and was able to survive at least with iblock + scsi_debug backend. >>>>> >>>>> Give that ceph and enhancedio backend are involved, it's not completely >>>>> clear yet if this is a target specific issue or not.. >>>>> >>>>> Mike, what's the setup your able to reproduce with..? >>>> >>>> For the LIO crash part I was using scsi_debug and iblock. I just do a dd >>>> to the fc/iscsi device exported by LIO. Then on the LIO box I do >>>> >>> >>> Scratch the FC part above. For FC I thought I was hitting a crash in the >>> abort related code when doing the same test. >>> >> >> Actually, you can maybe ignore everything about the LIO crash. I now >> tested iscsi and fc with 4.1.0-rc2 and they both work ok for me now. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html