Re: kernel BUG at include/linux/ceph/decode.h:262

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ilya,

I'm afraid this is all we have from the time of the crash:

Jan 25 21:18:42 sn319 kernel: beegfs: enabling unsafe global rkey
Jan 25 23:33:47 sn319 kernel: CPU: 12 PID: 123399 Comm: octave-cli Tainted: G           OE  ------------ T 3.10.0-957.1
2.2.el7.x86_64 #1
Jan 25 23:33:47 sn319 kernel: ------------[ cut here ]------------
Jan 25 23:33:47 sn319 kernel: Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 1.15.0 09/11/2020
Jan 25 23:33:47 sn319 kernel: igb i2c_algo_bit ixgbe dca ptp pps_core mdio sd_mod crc_t10dif crct10dif_common
Jan 25 23:33:47 sn319 kernel: invalid opcode: 0000 [#1] SMP
Jan 25 23:33:47 sn319 kernel: kernel BUG at include/linux/ceph/decode.h:262!
Jan 25 23:33:47 sn319 kernel: Modules linked in: squashfs loop overlay(T) 8021q garp mrp stp llc nfsv3 nfs_acl nfs lockd grace fscache beegfs(OE) ceph libceph libcrc32c dns_resolver rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) dcdbas mlx4_core(OE) amd64_edac_mod edac_mce_amd kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sg ahci libahci mgag200 ttm libata ccp i2c_piix4 drm_kms_helper k10temp syscopyarea sysfillrect sysimgblt fb_sys_fops drm ipmi_si ipmi_devintf ipmi_msghandler drm_panel_orientation_quirks acpi_power_meter acpi_cpufreq sunrpc knem(OE) ip_tables mlx5_ib(OE) megaraid_sas ib_uverbs(OE) ib_core(OE) mlx5_core(OE) mlxfw(OE) devlink mlx_compat(OE)
Jan 29 15:33:18 sn319.hpc.ait.dtu.dk wwlogger: Running provision script: adhoc-pre

This is a stateless deployment and the node crashed hard. syslog was not able to push more lines to the log server, either due to network coming off-line or everything got stopped. Next time we see this problem, we will try to access a crashed node and hope we can pull out more information.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Ilya Dryomov <idryomov@xxxxxxxxx>
Sent: 31 January 2022 17:16:01
To: Frank Schilder
Cc: ceph-users; Jeff Layton
Subject: Re:  kernel BUG at include/linux/ceph/decode.h:262

On Mon, Jan 31, 2022 at 5:07 PM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi all,
>
> we observed server crashes with these possibly related error messages in the log showing up:
>
> Jan 26 10:07:53 sn180 kernel: kernel BUG at include/linux/ceph/decode.h:262!
> Jan 25 23:33:47 sn319 kernel: kernel BUG at include/linux/ceph/decode.h:262!
> Jan 25 16:32:37 sn323 kernel: kernel BUG at include/linux/ceph/decode.h:262!
> Jan 25 14:05:07 sn328 kernel: kernel BUG at include/linux/ceph/decode.h:262!
> Jan 26 18:47:40 sn369 kernel: kernel BUG at include/linux/ceph/decode.h:262!
> Jan 27 21:43:25 sn376 kernel: kernel BUG at include/linux/ceph/decode.h:262!
> Jan 28 09:11:00 sn424 kernel: kernel BUG at include/linux/ceph/decode.h:262!

The BUG appears to be

    BUG_ON(*p + 1 + sizeof(ino) + sizeof(len) + len > end);

in ceph_encode_filepath().

>
> The crash repost says:
>
> Jan 25 23:33:47 sn319 kernel: ------------[ cut here ]------------
> Jan 25 23:33:47 sn319 kernel: kernel BUG at include/linux/ceph/decode.h:262!
> Jan 25 23:33:47 sn319 kernel: invalid opcode: 0000 [#1] SMP
> Jan 25 23:33:47 sn319 kernel: Modules linked in: squashfs loop overlay(T) 8021q garp mrp stp llc nfsv3 nfs_acl nfs lockd grace fscache beegfs(OE) ceph libceph libcrc32c dns_resolver rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib
> (OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) dcdbas mlx4_core(OE) amd64_edac_mod edac_mce_amd kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ab
> lk_helper cryptd pcspkr sg ahci libahci mgag200 ttm libata ccp i2c_piix4 drm_kms_helper k10temp syscopyarea sysfillrect sysimgblt fb_sys_fops drm ipmi_si ipmi_devintf ipmi_msghandler drm_panel_orientation_quirks acpi_power_meter acpi_cp
> ufreq sunrpc knem(OE) ip_tables mlx5_ib(OE) megaraid_sas ib_uverbs(OE) ib_core(OE) mlx5_core(OE) mlxfw(OE) devlink mlx_compat(OE)
> Jan 25 23:33:47 sn319 kernel: igb i2c_algo_bit ixgbe dca ptp pps_core mdio sd_mod crc_t10dif crct10dif_common
> Jan 25 23:33:47 sn319 kernel: CPU: 12 PID: 123399 Comm: octave-cli Tainted: G           OE  ------------ T 3.10.0-957.12.2.el7.x86_64 #1
> Jan 25 23:33:47 sn319 kernel: Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 1.15.0 09/11/2020

What about a stack trace that should follow here?

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux