Re: centos 7.6 kernel panic caused by osd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 11, 2019 at 12:20 AM Rom Freiman <rom@xxxxxxxxxxxxxxx> wrote:
>
> Hey,
> After upgrading to centos7.6, I started encountering the following kernel panic
>
> [17845.147263] XFS (rbd4): Unmounting Filesystem
> [17846.860221] rbd: rbd4: capacity 3221225472 features 0x1
> [17847.109887] XFS (rbd4): Mounting V5 Filesystem
> [17847.191646] XFS (rbd4): Ending clean mount
> [17861.663757] rbd: rbd5: capacity 3221225472 features 0x1
> [17862.930418] usercopy: kernel memory exposure attempt detected from ffff9d54d26d8800 (kmalloc-512) (1024 bytes)
> [17862.941698] ------------[ cut here ]------------
> [17862.946854] kernel BUG at mm/usercopy.c:72!
> [17862.951524] invalid opcode: 0000 [#1] SMP
> [17862.956123] Modules linked in: vhost_net vhost macvtap macvlan tun xt_REDIRECT nf_nat_redirect ip6table_mangle xt_nat xt_mark xt_connmark xt_CHECKSUM ip6table_raw xt_physdev iptable_mangle veth iptable_raw rbd libceph dns_resolver ebtable_filter ebtables ip6table_filter ip6_tables xt_comment mlx4_en(OE) mlx4_core(OE) xt_multiport ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc xfs openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack mlx5_core(OE) mlxfw(OE) iTCO_wdt iTCO_vendor_support sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass pcspkr joydev sg mei_me lpc_ich i2c_i801 mei ioatdma ipmi_si ipmi_devintf ipmi_msghandler
> [17863.036328]  dm_multipath ip_tables ext4 mbcache jbd2 dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel mgag200 igb aesni_intel isci lrw gf128mul glue_helper ablk_helper ahci drm_kms_helper cryptd libsas dca syscopyarea sysfillrect sysimgblt fb_sys_fops ttm libahci scsi_transport_sas ptp drm libata pps_core mlx_compat(OE) drm_panel_orientation_quirks i2c_algo_bit devlink wmi scsi_transport_iscsi sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mlx4_core]
> [17863.094372] CPU: 3 PID: 71755 Comm: msgr-worker-1 Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.1.3.el7.x86_64 #1
> [17863.107673] Hardware name: Intel Corporation S2600JF/S2600JF, BIOS SE5C600.86B.02.06.0006.032420170950 03/24/2017
> [17863.119134] task: ffff9d4e8e33e180 ti: ffff9d53dbaf8000 task.ti: ffff9d53dbaf8000
> [17863.127489] RIP: 0010:[<ffffffffa5e3e167>]  [<ffffffffa5e3e167>] __check_object_size+0x87/0x250
> [17863.137217] RSP: 0018:ffff9d53dbafbb98  EFLAGS: 00010246
> [17863.143140] RAX: 0000000000000062 RBX: ffff9d54d26d8800 RCX: 0000000000000000
> [17863.151106] RDX: 0000000000000000 RSI: ffff9d557bad3898 RDI: ffff9d557bad3898
> [17863.159072] RBP: ffff9d53dbafbbb8 R08: 0000000000000000 R09: 0000000000000000
> [17863.167038] R10: 0000000000000d0f R11: ffff9d53dbafb896 R12: 0000000000000400
> [17863.175001] R13: 0000000000000001 R14: ffff9d54d26d8c00 R15: 0000000000000400
> [17863.182968] FS:  00007f531fa98700(0000) GS:ffff9d557bac0000(0000) knlGS:0000000000000000
> [17863.192001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [17863.198414] CR2: 00007f4438516930 CR3: 0000000f19236000 CR4: 00000000001627e0
> [17863.206379] Call Trace:
> [17863.209114]  [<ffffffffa5f8c0dd>] memcpy_toiovec+0x4d/0xb0
> [17863.215240]  [<ffffffffa622a858>] skb_copy_datagram_iovec+0x128/0x280
> [17863.222434]  [<ffffffffa629172a>] tcp_recvmsg+0x22a/0xb30
> [17863.228463]  [<ffffffffa62c00e0>] inet_recvmsg+0x80/0xb0
> [17863.234395]  [<ffffffffa62186ec>] sock_aio_read.part.9+0x14c/0x170
> [17863.241297]  [<ffffffffa5cd676b>] ? wake_up_q+0x5b/0x80
> [17863.247129]  [<ffffffffa6218731>] sock_aio_read+0x21/0x30
> [17863.253157]  [<ffffffffa5e40743>] do_sync_read+0x93/0xe0
> [17863.259087]  [<ffffffffa5e41225>] vfs_read+0x145/0x170
> [17863.264823]  [<ffffffffa5e4203f>] SyS_read+0x7f/0xf0
> [17863.270366]  [<ffffffffa6374ddb>] system_call_fastpath+0x22/0x27
> [17863.277061] Code: 45 d1 48 c7 c6 d4 b6 67 a6 48 c7 c1 e0 4b 68 a6 48 0f 45 f1 49 89 c0 4d 89 e1 48 89 d9 48 c7 c7 d0 1a 68 a6 31 c0 e8 20 d5 51 00 <0f> 0b 0f 1f 80 00 00 00 00 48 c7 c0 00 00 c0 a5 4c 39 f0 73 0d
> [17863.298802] RIP  [<ffffffffa5e3e167>] __check_object_size+0x87/0x250
> [17863.305912]  RSP <ffff9d53dbafbb98>
>
> It seems to be related to rbd operations but I cannot pinpoint directly the reason.

To me this seems to be an issue in the networking subsystem and there
is nothing, at this stage, that implicates the ceph modules.

If the Mellanox modules are involved in any way I would start looking
there (not because I am biased against them, but because experience
tells me that is the place to start) and then move on to the other
networking modules and the kernel more generally. This looks like some
sort of memory accounting error in the networking subsystem. I could
be wrong, of course, but there would need to be further data to tell
either way. I'd suggest capturing a vmcore and getting someone to
analyse it for you would be a good next step.

>
> Versions:
> CentOS Linux release 7.6.1810 (Core)
> Linux stratonode1.node.strato 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 29 14:49:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>
> librbd1-12.2.8-0.el7.x86_64
>
>
> [root@stratonode1 ~]# modinfo libceph
> filename: /lib/modules/3.10.0-957.1.3.el7.x86_64/kernel/net/ceph/libceph.ko.xz
> license: GPL
> description: Ceph core library
> author: Patience Warnick <patience@xxxxxxxxxxxx>
> author: Yehuda Sadeh <yehuda@xxxxxxxxxxxxxxx>
> author: Sage Weil <sage@xxxxxxxxxxxx>
> retpoline: Y
> rhelversion: 7.6
> srcversion: 4F8CE6AEFA99B11C267981D
> depends: libcrc32c,dns_resolver
> intree: Y
> vermagic: 3.10.0-957.1.3.el7.x86_64 SMP mod_unload modversions
> signer: CentOS Linux kernel signing key
> sig_key: E7:CE:F3:61:3A:9B:8B:D0:12:FA:E7:49:82:72:15:9B:B1:87:9C:65
> sig_hashalgo: sha256
> [root@stratonode1 ~]# modinfo rbd
> filename: /lib/modules/3.10.0-957.1.3.el7.x86_64/kernel/drivers/block/rbd.ko.xz
> license: GPL
> description: RADOS Block Device (RBD) driver
> author: Jeff Garzik <jeff@xxxxxxxxxx>
> author: Yehuda Sadeh <yehuda@xxxxxxxxxxxxxxx>
> author: Sage Weil <sage@xxxxxxxxxxxx>
> author: Alex Elder <elder@xxxxxxxxxxx>
> retpoline: Y
> rhelversion: 7.6
> srcversion: 5386BBBD00C262C66CB81F5
> depends: libceph
> intree: Y
> vermagic: 3.10.0-957.1.3.el7.x86_64 SMP mod_unload modversions
> signer: CentOS Linux kernel signing key
> sig_key: E7:CE:F3:61:3A:9B:8B:D0:12:FA:E7:49:82:72:15:9B:B1:87:9C:65
> sig_hashalgo: sha256
> parm: single_major:Use a single major number for all rbd devices (default: true) (bool)
>
> I reported the issue here as well:
> https://bugs.centos.org/view.php?id=15681
>
>
> Help will be appreciated.
>
> Thanks,
> Rom
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux