Re: XFS kernel errors bringing up OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I believe its already been fixed in 4.13.1

https://github.com/torvalds/linux/commit/cd87d867920155911d0d2e6485b769d853547750#diff-69e107fa3b585a125ef74b5ecafd424e

We put that kernel on the storage servers that were having the issue
and it went away.  Im hoping they backport it to 4.12 or 4.9 kernels



On Wed, Sep 13, 2017 at 11:01 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> On Tue, 2017-09-12 at 09:25 -0400, Wyllys Ingersoll wrote:
>> Ceph 10.2.7
>> Kernel 4.12.10
>>
>> We are seeing frequent kernel errors that cause the XFS based OSD
>> processes to crash and restart.  Has anyone seen or reported something
>> like this before?  Maybe due to bad or failing disks, but its hard to
>> tell.
>>
>>
>>
>> [Tue Sep 12 09:18:32 2017] BUG: unable to handle kernel NULL pointer
>> dereference at 0000000000000090
>> [Tue Sep 12 09:18:32 2017] IP: xfs_da3_node_read+0x2e/0xb0 [xfs]
>> [Tue Sep 12 09:18:32 2017] PGD 0
>> [Tue Sep 12 09:18:32 2017] P4D 0
>>
>> [Tue Sep 12 09:18:32 2017] Oops: 0000 [#23] SMP
>> [Tue Sep 12 09:18:32 2017] Modules linked in: binfmt_misc xfs
>> libcrc32c dm_crypt intel_rapl x86_pkg_temp_thermal ipmi_ssif
>> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
>> crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64
>> input_leds crypto_simd glue_helper cryptd shpchp intel_cstate
>> intel_rapl_perf lpc_ich mei_me mei mac_hid ipmi_si ipmi_devintf
>> ipmi_msghandler acpi_power_meter acpi_pad 8021q garp mrp stp llc
>> bonding autofs4 btrfs xor raid6_pq ses enclosure mlx4_en hid_generic
>> ttm usbhid hid drm_kms_helper syscopyarea igb sysfillrect e1000e dca
>> sysimgblt fb_sys_fops mlx4_core mpt3sas ptp ahci devlink drm
>> raid_class pps_core libahci scsi_transport_sas i2c_algo_bit
>> [Tue Sep 12 09:18:32 2017] CPU: 8 PID: 40382 Comm: tp_fstore_op
>> Tainted: G      D         4.12.10-041210-generic #201708300614
>> [Tue Sep 12 09:18:32 2017] Hardware name: AIC SB303-LB/LIBRA, BIOS
>> LIBKV070 08/03/2016
>> [Tue Sep 12 09:18:32 2017] task: ffff8f03b4220000 task.stack: ffff9a6a75ff0000
>> [Tue Sep 12 09:18:32 2017] RIP: 0010:xfs_da3_node_read+0x2e/0xb0 [xfs]
>> [Tue Sep 12 09:18:32 2017] RSP: 0018:ffff9a6a75ff3d30 EFLAGS: 00010282
>> [Tue Sep 12 09:18:32 2017] RAX: 0000000000000000 RBX: ffff8f08b8ce9d98
>> RCX: 0000000000000001
>> [Tue Sep 12 09:18:32 2017] RDX: ffffffffc0a37700 RSI: 0000000000000000
>> RDI: ffff9a6a75ff3cd8
>> [Tue Sep 12 09:18:32 2017] RBP: ffff9a6a75ff3d48 R08: 00000000ffffffff
>> R09: 0000000000000001
>> [Tue Sep 12 09:18:32 2017] R10: 0000000000000001 R11: 0000000000000001
>> R12: ffff9a6a75ff3d78
>> [Tue Sep 12 09:18:32 2017] R13: 0000000000000005 R14: 00000000894e93b5
>> R15: ffff8f1536502010
>> [Tue Sep 12 09:18:32 2017] FS:  00007f82c9b70700(0000)
>> GS:ffff8f26ffc00000(0000) knlGS:0000000000000000
>> [Tue Sep 12 09:18:32 2017] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [Tue Sep 12 09:18:32 2017] CR2: 0000000000000090 CR3: 00000017cf710000
>> CR4: 00000000001406e0
>> [Tue Sep 12 09:18:32 2017] Call Trace:
>> [Tue Sep 12 09:18:32 2017]  xfs_attr3_node_inactive+0xd0/0x230 [xfs]
>> [Tue Sep 12 09:18:32 2017]  xfs_attr_inactive+0x267/0x280 [xfs]
>> [Tue Sep 12 09:18:32 2017]  xfs_inactive+0xe2/0x110 [xfs]
>> [Tue Sep 12 09:18:32 2017]  xfs_fs_destroy_inode+0x9f/0x200 [xfs]
>> [Tue Sep 12 09:18:32 2017]  destroy_inode+0x3b/0x60
>> [Tue Sep 12 09:18:32 2017]  evict+0x136/0x1a0
>> [Tue Sep 12 09:18:32 2017]  iput+0x14c/0x220
>> [Tue Sep 12 09:18:32 2017]  do_unlinkat+0x1a7/0x310
>> [Tue Sep 12 09:18:32 2017]  SyS_unlink+0x16/0x20
>> [Tue Sep 12 09:18:32 2017]  entry_SYSCALL_64_fastpath+0x1e/0xa9
>> [Tue Sep 12 09:18:32 2017] RIP: 0033:0x7f82d7753ea7
>> [Tue Sep 12 09:18:32 2017] RSP: 002b:00007f82c9b6d2e8 EFLAGS: 00000246
>> ORIG_RAX: 0000000000000057
>> [Tue Sep 12 09:18:32 2017] RAX: ffffffffffffffda RBX: 00005606b600e000
>> RCX: 00007f82d7753ea7
>> [Tue Sep 12 09:18:32 2017] RDX: 00007f82c9b6d2a0 RSI: 0000000000000000
>> RDI: 00005606bfd32a80
>> [Tue Sep 12 09:18:32 2017] RBP: 000056033335ab20 R08: 0000000000450000
>> R09: 0000000000000001
>> [Tue Sep 12 09:18:32 2017] R10: 0000000000000000 R11: 0000000000000246
>> R12: 00007f82da606c60
>> [Tue Sep 12 09:18:32 2017] R13: 00005606812ebd60 R14: 00000000040ffda5
>> R15: 00005606dfb64a60
>> [Tue Sep 12 09:18:32 2017] Code: 00 00 55 48 89 e5 41 54 53 4d 89 c4
>> 48 89 fb 48 83 ec 08 68 00 77 a3 c0 e8 e0 fe ff ff 85 c0 5a 75 46 48
>> 85 db 74 41 49 8b 34 24 <48> 8b 96 90 00 00 00 0f b7 52 08 66 c1 c2 08
>> 66 81 fa be 3e 74
>> [Tue Sep 12 09:18:32 2017] RIP: xfs_da3_node_read+0x2e/0xb0 [xfs] RSP:
>> ffff9a6a75ff3d30
>> [Tue Sep 12 09:18:32 2017] CR2: 0000000000000090
>
> That's pretty clearly a kernel bug. I'd report that to the xfs mailing
> list (linux-xfs@xxxxxxxxxxxxxxx).
> --
> Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux