Re: System crashes with increased drive count

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 14, 2014 at 6:57 PM, Nicholas A. Bellinger
<nab@xxxxxxxxxxxxxxx> wrote:
> On Wed, 2014-05-14 at 17:23 -0700, Jun Wu wrote:
>> On Wed, May 14, 2014 at 2:39 PM, Nicholas A. Bellinger
>> <nab@xxxxxxxxxxxxxxx> wrote:
>> > On Tue, 2014-05-13 at 14:52 -0700, Jun Wu wrote:
>> >> Hi Nicholas,
>> >>
>> >> We had to roll back system from 3.14 to 3.11 due to compile issues of
>> >> our software. So I am not able to verify your fix at this point.
>> >
>> > That is unfortunate your stuck on a now unsupported stable kernel.
>> > There are some other libfc related fixes that have gone in during the
>> > v3.13 timeframe, so I'd strongly recommend upgrading to at least that
>> > stable version.
>> >
>> > In any event, I'll be pushing that particular >= v3.13.y patch anyways,
>> > as it's a obvious regression bugfix for percpu-ida pre-allocation.
>> >
>> >> I ran the same tests on 3.11 instead.
>> >>
>> >> In one case the target crashed with following message:
>> >>
>> >> May 13 13:06:25 poc2 kernel: BUG: unable to handle kernel paging
>> >> request at ffffffffffffffa4
>> >> May 13 13:06:25 poc2 kernel: IP: [<ffffffff8164ac07>]
>> >> _raw_spin_lock_bh+0x17/0x40
>> >> May 13 13:06:25 poc2 kernel: PGD 1c0f067 PUD 1c11067 PMD 0
>> >> May 13 13:06:25 poc2 kernel: Oops: 0002 [#1] SMP
>> >> May 13 13:06:25 poc2 kernel: Modules linked in: fcoe libfcoe 8021q
>> > garp mrp tcm_fc libfc scsi_transport_fc scsi_tgt target_core_pscsi
>> >> target_core_file target_core_iblock iscsi_target_mod target_core_mod
>> >> ip6t_rpfilter ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute
>> >> bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6
>> >> nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security
>> >> ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4
>> >> nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
>> >> iptable_security iptable_raw nfsd auth_rpcgss nfs_acl lockd sunrpc
>> >> ixgbe mdio igb ptp pps_core serio_raw ses enclosure iTCO_wdt
>> >> iTCO_vendor_support lpc_ich mfd_core shpchp i2c_i801 coretemp
>> >> kvm_intel kvm crc32c_intel microcode i7core_edac ioatdma acpi_cpufreq
>> >> edac_core dca mperf radeon i2c_algo_bit
>> >> May 13 13:06:25 poc2 kernel: drm_kms_helper ttm drm ata_generic
>> >> i2c_core pata_acpi pata_jmicron aacraid
>> >> May 13 13:06:25 poc2 kernel: CPU: 0 PID: 1810 Comm: kworker/0:0 Not
>> >> tainted 3.11.10-301.fc20.x86_64 #1
>> >> May 13 13:06:25 poc2 kernel: Hardware name: Supermicro X8DTN/X8DTN,
>> >> BIOS 2.1c       10/28/2011
>> >> May 13 13:06:25 poc2 kernel: Workqueue: target_completion target_complete_ok_work [target_core_mod]
>> >> May 13 13:06:25 poc2 kernel: task: ffff88032c5096e0 ti: ffff88031bb78000 task.ti: ffff88031bb78000
>> >> May 13 13:06:25 poc2 kernel: RIP: 0010:[<ffffffff8164ac07>] [<ffffffff8164ac07>] _raw_spin_lock_bh+0x17/0x40
>> >> May 13 13:06:25 poc2 kernel: RSP: 0018:ffff88031bb79cf0  EFLAGS: 00010206
>> >> May 13 13:06:25 poc2 kernel: RAX: 0000000000000100 RBX: ffffffffffffffa4 RCX: 0000000000000000
>> >> May 13 13:06:25 poc2 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffa4
>> >> May 13 13:06:25 poc2 kernel: RBP: ffff88031bb79cf8 R08: 00000000ffffffff R09: ffff88031a37f678
>> >> May 13 13:06:25 poc2 kernel: R10: 0000000000000001 R11: 0000000000000044 R12: 0000000000000000
>> >> May 13 13:06:25 poc2 kernel: R13: ffff88031a37f678 R14: ffff88062d9fd6c8 R15: ffff88032c6da05c
>> >> May 13 13:06:25 poc2 kernel: FS:  0000000000000000(0000) GS:ffff880333c00000(0000) knlGS:0000000000000000
>> >> May 13 13:06:25 poc2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> >> May 13 13:06:25 poc2 kernel: CR2: ffffffffffffffa4 CR3: 0000000001c0c000 CR4: 00000000000007f0
>> >> May 13 13:06:25 poc2 kernel: Stack:
>> >> May 13 13:06:25 poc2 kernel: ffffffffffffffa4 ffff88031bb79d18 ffffffffa0594d2b ffff880328d13410
>> >> May 13 13:06:25 poc2 kernel: ffff88031a37c200 ffff88031bb79d58 ffffffffa05356f2 0000000000000018
>> >> May 13 13:06:25 poc2 kernel: ffff88062cea8800 0000000000000000 ffffea000c8eb640 0000000000000000
>> >> May 13 13:06:25 poc2 kernel: Call Trace:
>> >> May 13 13:06:25 poc2 kernel: [<ffffffffa0594d2b>] fc_seq_start_next+0x1b/0x40 [libfc]
>> >> May 13 13:06:25 poc2 kernel: [<ffffffffa05356f2>] ft_queue_status+0xf2/0x220 [tcm_fc]
>> >> May 13 13:06:25 poc2 kernel: [<ffffffffa0536972>] ft_queue_data_in+0x72/0x5a0 [tcm_fc]
>> >> May 13 13:06:25 poc2 kernel: [<ffffffffa04f57ba>] target_complete_ok_work+0x14a/0x2b0 [target_core_mod]
>> >> May 13 13:06:25 poc2 kernel: [<ffffffff810810f5>] process_one_work+0x175/0x430
>> >> May 13 13:06:25 poc2 kernel: [<ffffffff81081d1b>] worker_thread+0x11b/0x3a0
>> >> May 13 13:06:25 poc2 kernel: [<ffffffff81081c00>] ? rescuer_thread+0x340/0x340
>> >> May 13 13:06:25 poc2 kernel: [<ffffffff81088660>] kthread+0xc0/0xd0
>> >> May 13 13:06:25 poc2 kernel: [<ffffffff810885a0>] ? insert_kthread_work+0x40/0x40
>> >> May 13 13:06:25 poc2 kernel: [<ffffffff8165332c>] ret_from_fork+0x7c/0xb0
>> >> May 13 13:06:25 poc2 kernel: [<ffffffff810885a0>] ? insert_kthread_work+0x40/0x40
>> >> May 13 13:06:25 poc2 kernel: Code: 1f 44 00 00 f3 90 0f b6 07 38 d0 75
>> >> f7 5d c3 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 53 48 89 fb e8 7e
>> >> 05 a2 ff b8 00 01 00 00 <f0> 66 0f c1 03 0f b6 d4 38 c2 74 0e 0f 1f 44
>> >> 00 00 f3 90 0f b6
>> >> May 13 13:06:25 poc2 kernel: RIP  [<ffffffff8164ac07>] _raw_spin_lock_bh+0x17/0x40
>> >>
>> >
>> > So before we start debugging again, please confirm that this is a
>> > *completely* stock v3.11.10 build, and that your not building
>> > out-of-tree target modules again.
>>
>> Yes we installed official Fedora 20 iso image and then
>> yum install targetcli
>> modprobe fcoe
>>
>> Kernel version is 3.11.10-301.fc20.x86_64 which is a stock kernel distribution.
>>
>
> So I'd still recommend >= v3.13.y in order to pick-up the other libfc
> related bugfixes that have gone in since v3.11.y ran out of stable
> support some months ago.  I can't tell if that is related to the bug(s)
> that your hitting on both initiator + target sides here, but it
> definitely would not hurt.
>
> Also, here is the updated >= v3.13.y percpu-ida pre-allocation
> regression fix that will be going into v3.15-rc6 shortly.  Note this is
> different from the original version, so please make sure you use this
> one instead.

That's what we are going to do. Our software compiles on 3.13 kernel.
I will run some test once the systems are ready.

Thanks a lot.

>
> https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?id=e7c36ac16b91bf09e94f82d1838f0f9f114bed08
>
> --nab
>
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux