Re: Update on crash with kernel 3.19

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Good morning - even after the upgrade to 4.1.-rc2 I got the crash this AM:

May  5 04:36:14 roc-4r-scd214 kernel: [15004.439254] BUG: Bad page
state in process LIOTarget  pfn:104a1aa
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439289]
page:ffffea0041286a80 count:-1 mapcount:0 mapping:          (null)
index:0x0
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439335] flags: 0x6ffff0000000000()
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439362] page dumped
because: nonzero _count
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439387] Modules linked
in: rbd libceph libcrc32c iscsi_target_mod target_core_file
target_core_pscsi target_core_iblock target_core_mod configfs
xt_multiport iptable_filter ip_tables x_tables enhanceio_rand(OE)
enhanceio_lru(OE) enhanceio_fifo(OE) enhanceio(OE) ipmi_devintf
ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp kvm
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac
edac_core joydev mei_me lpc_ich mei nfsd ioatdma ses auth_rpcgss
enclosure nfs_acl ipmi_si nfs 8250_fintek ipmi_msghandler 8021q lockd
garp mrp grace stp llc sunrpc bonding wmi mac_hid fscache lp parport
mlx4_en vxlan ip6_udp_tunnel udp_tunnel hid_generic usbhid igb hid
mpt2sas i2c_algo_bit dca ahci ptp raid_class mlx4_core libahci
scsi_transport_sas pps_core
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439456] CPU: 9 PID:
242884 Comm: LIOTarget Tainted: G         C OE
4.1.0-040100rc2-generic #201505032335
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439458] Hardware name:
Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a
12/05/2013
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439460]  ffffffff81cc6ef1
ffff88085a023ad8 ffffffff817f6e5a 0000000000000007
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439463]  ffffea0041286a80
ffff88085a023b08 ffffffff8118f6f6 ffff880843308a10
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439466]  ffffea0041286a80
0000000000000000 00000000002284d0 ffff88085a023b58
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439469] Call Trace:
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439481]
[<ffffffff817f6e5a>] dump_stack+0x45/0x57
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439487]
[<ffffffff8118f6f6>] bad_page.part.70+0xc6/0x110
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439490]
[<ffffffff8118f924>] prep_new_page+0x1e4/0x1f0
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439494]
[<ffffffff81193aed>] get_page_from_freelist+0x2bd/0x720
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439497]
[<ffffffff811940df>] __alloc_pages_nodemask+0x18f/0x9c0
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439504]
[<ffffffff811f5de9>] ? memcg_check_events+0x29/0x50
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439508]
[<ffffffff81199cab>] ? lru_cache_add_active_or_unevictable+0x2b/0x90
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439514]
[<ffffffff811db51c>] alloc_pages_current+0x9c/0x110
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439520]
[<ffffffff8106fa9b>] pte_alloc_one+0x1b/0x50
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439526]
[<ffffffff811b92b2>] __pte_alloc+0x32/0x180
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439530]
[<ffffffff811bc342>] __handle_mm_fault+0x342/0x360
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439533]
[<ffffffff811bc412>] handle_mm_fault+0xb2/0x160
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439537]
[<ffffffff8106a610>] __do_page_fault+0x190/0x470
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439540]
[<ffffffff8106aa77>] do_page_fault+0x37/0x90
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439544]
[<ffffffff81805b58>] page_fault+0x28/0x30
May  5 04:36:14 roc-4r-scd214 kernel: [15004.439547] Disabling lock
debugging due to kernel taint
May  5 04:36:19 roc-4r-scd214 kernel: [15007.409541] general
protection fault: 0000 [#1] SMP
May  5 04:36:19 roc-4r-scd214 kernel: [15007.409575] Modules linked
in: rbd libceph libcrc32c iscsi_target_mod target_core_file
target_core_pscsi target_core_iblock target_core_mod configfs
xt_multiport iptable_filter ip_tables x_tables enhanceio_rand(OE)
enhanceio_lru(OE) enhanceio_fifo(OE) enhanceio(OE) ipmi_devintf
ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp kvm
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac
edac_core joydev mei_me lpc_ich mei nfsd ioatdma ses auth_rpcgss
enclosure nfs_acl ipmi_si nfs 8250_fintek ipmi_msghandler 8021q lockd
garp mrp grace stp llc sunrpc bonding wmi mac_hid fscache lp parport
mlx4_en vxlan ip6_udp_tunnel udp_tunnel hid_generic usbhid igb hid
mpt2sas i2c_algo_bit dca ahci ptp raid_class mlx4_core libahci
scsi_transport_sas pps_core
May  5 04:36:19 roc-4r-scd214 kernel: [15007.410074] CPU: 9 PID: 242920 Comm: la

On Mon, May 4, 2015 at 10:54 PM, Robert Wood <rwood@xxxxxxxxxxxxxxxxxxx> wrote:
> Thanks Mike, will test right away with 4.1.0-rc2.  Just got another
> crash, ceph did not seem to freeze at all on this one:
>
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357214] BUG: Bad page
> state in process LIOLogicalUnit  pfn:36a84
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357247]
> page:ffffea0000daa100 count:-1 mapcount:0 mapping:          (null)
> index:0x0
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357292] flags: 0x1ffff0000000000()
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357318] page dumped
> because: nonzero _count
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357343] Modules linked
> in: target_core_user uio rbd libceph libcrc32c iscsi_target_mod
> target_core_file target_core_pscsi target_core_iblock target_core_mod
> configfs xt_multiport iptable_filter ip_tables x_tables
> enhanceio_rand(OE) enhanceio_lru(OE) enhanceio_fifo(OE) enhanceio(OE)
> ipmi_devintf ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp
> kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
> aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev sb_edac
> edac_core lpc_ich mei_me mei ioatdma ipmi_si ipmi_msghandler
> 8250_fintek wmi mac_hid 8021q garp mrp stp llc bonding lp nfsd parport
> auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache mlx4_en vxlan
> ip6_udp_tunnel udp_tunnel hid_generic ahci libahci igb usbhid hid
> i2c_algo_bit mpt2sas dca mlx4_core ptp raid_class scsi_transport_sas
> pps_core
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357408] CPU: 17 PID:
> 360893 Comm: LIOLogicalUnit Tainted: G         C OE
> 3.19.4-031904-generic #201504131440
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357409] Hardware name:
> Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a
> 12/05/2013
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357411]  ffffffff81acc6c2
> ffff88105be1ba50 ffffffff817c6cd7 0000000000000007
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357415]  ffffea0000daa100
> ffff88105be1ba80 ffffffff811816c6 0000000000000030
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357418]  ffffea0000daa100
> 0000000000000000 00000000002284d0 ffff88105be1ba90
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357421] Call Trace:
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357429]
> [<ffffffff817c6cd7>] dump_stack+0x45/0x57
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357434]
> [<ffffffff811816c6>] bad_page.part.58+0xc6/0x110
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357437]
> [<ffffffff81181728>] bad_page+0x18/0x30
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357440]
> [<ffffffff81181aac>] prep_new_page+0x1bc/0x1d0
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357443]
> [<ffffffff811859d0>] get_page_from_freelist+0x420/0x6f0
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357447]
> [<ffffffff81185e11>] __alloc_pages_nodemask+0x171/0xaf0
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357452]
> [<ffffffff811d7cdd>] ? kmem_cache_alloc+0x19d/0x210
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357455]
> [<ffffffff81185e11>] ? __alloc_pages_nodemask+0x171/0xaf0
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357459]
> [<ffffffff811cd71c>] alloc_pages_current+0x9c/0x110
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357462]
> [<ffffffff8118143e>] __get_free_pages+0xe/0x40
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357466]
> [<ffffffff81069a81>] pgd_alloc+0x21/0x1f0
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357470]
> [<ffffffff810741b4>] mm_init+0x164/0x1d0
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357475]
> [<ffffffff817ba6d0>] dup_mm+0x8b/0x125
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357477]
> [<ffffffff817ba830>] copy_mm+0xc6/0xe9
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357481]
> [<ffffffff810755e1>] copy_process.part.29+0x681/0xea0
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357485]
> [<ffffffff811b2351>] ? vma_link+0xd1/0xe0
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357488]
> [<ffffffff81075e80>] copy_process+0x80/0x90
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357491]
> [<ffffffff81075fa2>] do_fork+0x62/0x280
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357495]
> [<ffffffff8108301f>] ? recalc_sigpending+0x1f/0x60
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357499]
> [<ffffffff81083a47>] ? __set_task_blocked+0x37/0x80
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357502]
> [<ffffffff81076246>] SyS_clone+0x16/0x20
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357506]
> [<ffffffff817d3b19>] stub_clone+0x69/0x90
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357509]
> [<ffffffff817d37cd>] ? system_call_fastpath+0x16/0x1b
> May  4 22:23:53 roc-4r-scd212 kernel: [20499.357511] Disabling lock
> debugging due to kernel taint
> May  4 22:23:54 roc-4r-scd212 kernel: [20499.592770] ABORT_TASK: Found
> referenced iSCSI task_tag: 74711
> May  4 22:23:54 roc-4r-scd212 kernel: [20499.592775] Unexpected ret:
> -32 send data 48
> May  4 22:23:54 roc-4r-scd212 kernel: [20499.592806] ABORT_TASK:
> Sending TMR_FUNCTION_COMPLETE for ref_tag: 74711
> May  4 22:23:56 roc-4r-scd212 kernel: [20502.043663] ABORT_TASK: Found
> referenced iSCSI task_tag: 107196
> May  4 22:23:56 roc-4r-scd212 kernel: [20502.043671] ABORT_TASK:
> Sending TMR_FUNCTION_COMPLETE for ref_tag: 107196
> May  4 22:23:56 roc-4r-scd212 kernel: [20502.043674] Unable to locate
> ITT: 0x0001a2bc on CID: 0
> May  4 22:23:56 roc-4r-scd212 kernel: [20502.043674] Unable to locate
> RefTaskTag: 0x0001a2bc on CID: 0.
> May  4 22:23:56 roc-4r-scd212 kernel: [20502.043732] ABORT_TASK:
> Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 107197
> May  4 22:23:56 roc-4r-scd212 kernel: [20502.043736] ABORT_TASK:
> Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 107197
> May  4 22:24:10 roc-4r-scd212 kernel: [20515.628746] ABORT_TASK: Found
> referenced iSCSI task_tag: 107206
> May  4 22:24:10 roc-4r-scd212 kernel: [20515.628750] ABORT_TASK:
> ref_tag: 107206 already complete, skipping
> May  4 22:24:10 roc-4r-scd212 kernel: [20515.628751] ABORT_TASK:
> Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 107206
> May  4 22:24:15 roc-4r-scd212 kernel: [20520.596113] general
> protection fault: 0000 [#1] SMP
> May  4 22:24:15 roc-4r-scd212 kernel: [20520.596148] Modules linked
> in: target_core_user uio rbd libceph libcrc32c iscsi_target_mod
> target_core_file target_core_pscsi target_core_iblock target_core_mod
> configfs xt_multiport iptable_filter
>
> On Mon, May 4, 2015 at 1:47 PM, Mike Christie <mchristi@xxxxxxxxxx> wrote:
>> On 05/04/2015 12:29 PM, Mike Christie wrote:
>>> On 05/04/2015 11:59 AM, Mike Christie wrote:
>>>> On 04/30/2015 01:47 AM, Nicholas A. Bellinger wrote:
>>>>> AFAICT from Robert + Alex's log this is the same type of scenario, and
>>>>> I'm pretty sure I was hitting the same login timeout handler back then,
>>>>> and was able to survive at least with iblock + scsi_debug backend.
>>>>>
>>>>> Give that ceph and enhancedio backend are involved, it's not completely
>>>>> clear yet if this is a target specific issue or not..
>>>>>
>>>>> Mike, what's the setup your able to reproduce with..?
>>>>
>>>> For the LIO crash part I was using scsi_debug and iblock. I just do a dd
>>>> to the fc/iscsi device exported by LIO. Then on the LIO box I do
>>>>
>>>
>>> Scratch the FC part above. For FC I thought I was hitting a crash in the
>>> abort related code when doing the same test.
>>>
>>
>> Actually, you can maybe ignore everything about the LIO crash. I now
>> tested iscsi and fc with 4.1.0-rc2 and they both work ok for me now.
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux