Hi, I’m currently also tracking this. I suspected an issue with older XFS instances that had a lot of “hard reboot” pressure lately. I started talking about this on the XFS mailing list a few days ago and Darrick picked it up. For me it’s happening on 4.9.43. Christian > On Sep 1, 2017, at 5:40 PM, kefu chai <tchaikov@xxxxxxxxx> wrote: > > On Fri, Sep 1, 2017 at 11:02 PM, Wyllys Ingersoll > <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote: >> Ceph 10.2.7 >> Ubuntu 16.04.2 >> Kernel 4.4.031 >> >> ceph-disk activate is failing to activate our OSDs on a server with 16 >> disks. Journals and Data are colocated on same disks. The kernel log >> is showing the following errors, does this look like a known bug? > > it was reported before, https://www.spinics.net/lists/ceph-users/msg36628.html > >> Would a newer kernel possibly help? > > not sure. probably the guys on linux-xfs[0] mailing list can answer this query. > > -- > [0] http://vger.kernel.org/vger-lists.html#linux-xfs > >> >> [Fri Sep 1 06:02:17 2017] BUG: unable to handle kernel NULL pointer >> dereference at 00000000000000a0 >> [Fri Sep 1 06:02:17 2017] IP: [<ffffffffc061a5a0>] >> xfs_da3_node_read+0x30/0xb0 [xfs] >> [Fri Sep 1 06:02:17 2017] PGD 0 >> [Fri Sep 1 06:02:17 2017] Oops: 0000 [#3] SMP >> [Fri Sep 1 06:02:17 2017] Modules linked in: xfs libcrc32c drbg >> ansi_cprng dm_crypt binfmt_misc ipmi_devintf intel_rapl >> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass >> crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw ipmi_ssif >> sb_edac gf128mul edac_core glue_helper ablk_helper mei_me lpc_ich >> input_leds cryptd mei shpchp 8250_fintek ipmi_si ipmi_msghandler >> acpi_power_meter acpi_pad mac_hid 8021q garp mrp stp llc bonding >> autofs4 btrfs xor raid6_pq ses enclosure mlx4_en vxlan ip6_udp_tunnel >> udp_tunnel ttm drm_kms_helper syscopyarea igb sysfillrect sysimgblt >> hid_generic e1000e fb_sys_fops dca usbhid mpt3sas ahci ptp mlx4_core >> drm hid raid_class libahci pps_core scsi_transport_sas i2c_algo_bit >> fjes >> [Fri Sep 1 06:02:17 2017] CPU: 1 PID: 13217 Comm: tp_fstore_op >> Tainted: G D 4.4.0-31-generic #50-Ubuntu >> [Fri Sep 1 06:02:17 2017] Hardware name: AIC SB303-LB/LIBRA, BIOS >> LIBKV070 08/03/2016 >> [Fri Sep 1 06:02:17 2017] task: ffff882f57940dc0 ti: ffff882ee9af0000 >> task.ti: ffff882ee9af0000 >> [Fri Sep 1 06:02:17 2017] RIP: 0010:[<ffffffffc061a5a0>] >> [<ffffffffc061a5a0>] xfs_da3_node_read+0x30/0xb0 [xfs] >> [Fri Sep 1 06:02:17 2017] RSP: 0018:ffff882ee9af3d00 EFLAGS: 00010282 >> [Fri Sep 1 06:02:17 2017] RAX: 0000000000000000 RBX: ffff880860d62740 >> RCX: 0000000000000001 >> [Fri Sep 1 06:02:17 2017] RDX: 0000000000000000 RSI: 0000000000000000 >> RDI: ffff882ee9af3cb0 >> [Fri Sep 1 06:02:17 2017] RBP: ffff882ee9af3d20 R08: 0000000000000001 >> R09: fffffffffffffffe >> [Fri Sep 1 06:02:17 2017] R10: ffff8807c374e1d0 R11: 0000000000000001 >> R12: ffff882ee9af3d50 >> [Fri Sep 1 06:02:17 2017] R13: ffff881ad14d9dc0 R14: 0000000000000009 >> R15: 000000003bb6d4fa >> [Fri Sep 1 06:02:17 2017] FS: 00007f178d54b700(0000) >> GS:ffff881820040000(0000) knlGS:0000000000000000 >> [Fri Sep 1 06:02:17 2017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [Fri Sep 1 06:02:17 2017] CR2: 00000000000000a0 CR3: 0000002f54061000 >> CR4: 00000000001406e0 >> [Fri Sep 1 06:02:17 2017] Stack: >> [Fri Sep 1 06:02:17 2017] ffffffffc0679b50 ffffffffc065aebc >> ffff882ee9af3de0 0000000000000009 >> [Fri Sep 1 06:02:17 2017] ffff882ee9af3d98 ffffffffc0636893 >> 0000000200000008 ffff880eef834010 >> [Fri Sep 1 06:02:17 2017] 00000001660a7d00 ffff8824d80fbd80 >> 0000000000000000 0000000000000000 >> [Fri Sep 1 06:02:17 2017] Call Trace: >> [Fri Sep 1 06:02:17 2017] [<ffffffffc065aebc>] ? >> xfs_trans_roll+0x2c/0x50 [xfs] >> [Fri Sep 1 06:02:17 2017] [<ffffffffc0636893>] >> xfs_attr3_node_inactive+0x183/0x220 [xfs] >> [Fri Sep 1 06:02:17 2017] [<ffffffffc06369dc>] >> xfs_attr3_root_inactive+0xac/0x100 [xfs] >> [Fri Sep 1 06:02:17 2017] [<ffffffffc0636b7c>] >> xfs_attr_inactive+0x14c/0x1a0 [xfs] >> [Fri Sep 1 06:02:17 2017] [<ffffffffc0650d95>] xfs_inactive+0x85/0x120 [xfs] >> [Fri Sep 1 06:02:17 2017] [<ffffffffc06562e5>] >> xfs_fs_evict_inode+0xa5/0x100 [xfs] >> [Fri Sep 1 06:02:17 2017] [<ffffffff8122887e>] evict+0xbe/0x190 >> [Fri Sep 1 06:02:17 2017] [<ffffffff81228b61>] iput+0x1c1/0x240 >> [Fri Sep 1 06:02:17 2017] [<ffffffff8121d659>] do_unlinkat+0x199/0x2d0 >> [Fri Sep 1 06:02:17 2017] [<ffffffff8121e1f6>] SyS_unlink+0x16/0x20 >> [Fri Sep 1 06:02:17 2017] [<ffffffff8182db32>] >> entry_SYSCALL_64_fastpath+0x16/0x71 >> [Fri Sep 1 06:02:17 2017] Code: 55 48 89 e5 41 54 53 4d 89 c4 48 89 >> fb 48 83 ec 10 48 c7 04 24 50 9b 67 c0 e8 dd fe ff ff 85 c0 75 46 48 >> 85 db 74 41 49 8b 34 24 <48> 8b 96 a0 00 00 00 0f b7 52 08 66 c1 c2 08 >> 66 81 fa be 3e 74 >> [Fri Sep 1 06:02:17 2017] RIP [<ffffffffc061a5a0>] >> xfs_da3_node_read+0x30/0xb0 [xfs] >> [Fri Sep 1 06:02:17 2017] RSP <ffff882ee9af3d00> >> [Fri Sep 1 06:02:17 2017] CR2: 00000000000000a0 >> [Fri Sep 1 06:02:17 2017] ---[ end trace d41664a5b9f3d7d2 ]--- >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Regards > Kefu Chai > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html Liebe Grüße, Christian Theune -- Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0 Flying Circus Internet Operations GmbH · http://flyingcircus.io Forsterstraße 29 · 06112 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
Attachment:
signature.asc
Description: Message signed with OpenPGP