Re: [ceph-users] kernel BUG at fs/ceph/inode.c:1197

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+ceph-devel

On Thu, May 4, 2017 at 12:51 AM, James Poole <james.poole@xxxxxxxxxxxxx> wrote:
> Hello,
>
> We currently have a ceph cluster supporting an Openshift cluster using
> cephfs and dynamic rbd provisioning. The client nodes appear to be
> triggering a kernel bug and are rebooting unexpectedly with the same message
> each time. Clients are running CentOS 7:
>
>       KERNEL: /usr/lib/debug/lib/modules/3.10.0-514.10.2.el7.x86_64/vmlinux
>     DUMPFILE: /var/crash/127.0.0.1-2017-05-02-09:06:17/vmcore  [PARTIAL
> DUMP]
>         CPUS: 16
>         DATE: Tue May  2 09:06:15 2017
>       UPTIME: 00:43:14
> LOAD AVERAGE: 1.52, 1.40, 1.48
>        TASKS: 7408
>     NODENAME: [redacted]
>      RELEASE: 3.10.0-514.10.2.el7.x86_64
>      VERSION: #1 SMP Fri Mar 3 00:04:05 UTC 2017
>      MACHINE: x86_64  (1997 Mhz)
>       MEMORY: 32 GB
>        PANIC: "kernel BUG at fs/ceph/inode.c:1197!"
>          PID: 133
>      COMMAND: "kworker/1:1"
>         TASK: ffff8801399bde20  [THREAD_INFO: ffff880138d0c000]
>          CPU: 1
>        STATE: TASK_RUNNING (PANIC)
>
> [ 2596.061470] ------------[ cut here ]------------
> [ 2596.061499] kernel BUG at fs/ceph/inode.c:1197!
> [ 2596.061516] invalid opcode: 0000 [#1] SMP
> [ 2596.061535] Modules linked in: cfg80211 rfkill binfmt_misc veth ext4
> mbcache jbd2 rbd xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4
> xt_mark ipt_MASQUERADE nf_nat_masquerad
> e_ipv4 xt_addrtype br_netfilter bridge stp llc dm_thin_pool
> dm_persistent_data dm_bio_prison dm_bufio loop fuse ceph libceph
> dns_resolver vport_vxlan vxlan ip6_udp_tunnel udp_tunnel op
> envswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 iptable_nat
> nf_nat_ipv4 nf_nat xt_limit nf_log_ipv4 vmw_vsock_vmci_transport
> nf_log_common xt_LOG vsock nf_conntrack_ipv4 nf_defr
> ag_ipv4 xt_comment xt_multiport xt_conntrack nf_conntrack iptable_filter
> intel_powerclamp coretemp iosf_mbi crc32_pclmul ghash_clmulni_intel
> aesni_intel lrw gf128mul glue_helper ablk_h
> elper cryptd ppdev vmw_balloon pcspkr sg vmw_vmci shpchp i2c_piix4
> parport_pc
> [ 2596.061875]  parport nfsd nfs_acl lockd auth_rpcgss grace sunrpc
> ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic
> ata_generic pata_acpi vmwgfx drm_kms_helper
>  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul
> crct10dif_common mptspi crc32c_intel drm ata_piix scsi_transport_spi
> serio_raw mptscsih libata mptbase vmxnet3 i2c_c
> ore fjes dm_mirror dm_region_hash dm_log dm_mod
> [ 2596.062042] CPU: 1 PID: 133 Comm: kworker/1:1 Not tainted
> 3.10.0-514.10.2.el7.x86_64 #1
> [ 2596.062070] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
> Desktop Reference Platform, BIOS 6.00 09/17/2015
> [ 2596.062118] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> [ 2596.062140] task: fffdf8801399be20 ti: ffff880138d0c000 task.ti:
> ffff880138d0c000
> [ 2596.062166] RIP: 0010:[<ffffffffa05d96c3>]  [<ffffffffa05d96c3>]
> ceph_fill_trace+0x893/0xa00 [ceph]
> [ 2596.062209] RSP: 0000:ffff880138d0fb80  EFLAGS: 00010287
> [ 2596.062230] RAX: ffff88083b079680 RBX: ffff8801efe86760 RCX:
> ffff880095e26c00
> [ 2596.062257] RDX: ffff880003e8f2c0 RSI: ffff88053b4c0a08 RDI:
> ffff88053b4c0a00
> [ 2596.062288] RBP: ffff880138d0fbf8 R08: ffff880003e8f2c0 R09:
> 0000000000000000
> [ 2596.062320] R10: 0000000000000001 R11: ffff8804256f3ac0 R12:
> ffff880121d15400
> [ 2596.062351] R13: ffff880138dd4000 R14: ffff88007053f280 R15:
> ffff8807ee10f2c0
> [ 2596.062379] FS:  0000000000000000(0000) GS:ffff88013b840000(0000)
> knlGS:0000000000000000
> [ 2596.062413] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 2596.062436] CR2: 00007fe3bab2dcd0 CR3: 000000042ebe0000 CR4:
> 00000000001407e0
> [ 2596.062498] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 2596.062540] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [ 2596.062567] Stack:
> [ 2596.062578]  ffff880121d15778 ffff880121d15718 ffff880138d0fc50
> ffff880095e26e7a
> [ 2596.062612]  ffff880035c12400 ffff88053b4c7800 000000003b4c0800
> ffff880138d0fbb8
> [ 2596.062645]  ffff880138d0fbb8 00000000a5446715 ffff88053b4c0800
> ffff88008238ee10
> [ 2596.062681] Call Trace:
> [ 2596.062703]  [<ffffffffa05f96a8>] handle_reply+0x3e8/0xc80 [ceph]
> [ 2596.062736]  [<ffffffffa05fbd39>] dispatch+0xd9/0xaf0 [ceph]
> [ 2596.062762]  [<ffffffff815559ca>] ? kernel_recvmsg+0x3a/0x50
> [ 2596.062790]  [<ffffffffa057ceff>] try_read+0x4bf/0x1220 [libceph]
> [ 2596.062819]  [<ffffffffa057b743>] ? try_write+0xa13/0xe60 [libceph]
> [ 2596.062851]  [<ffffffffa057dd19>] ceph_con_workfn+0xb9/0x650 [libceph]
> [ 2596.062878]  [<ffffffff810a810b>] process_one_work+0x17b/0x470
> [ 2596.062902]  [<ffffffff810a8f46>] worker_thread+0x126/0x410
> [ 2596.062925]  [<ffffffff810a8e20>] ? rescuer_thread+0x460/0x460
> [ 2596.062949]  [<ffffffff810b06ff>] kthread+0xcf/0xe0
> [ 2596.064014]  [<ffffffff810b0630>] ? kthread_create_on_node+0x140/0x140
> [ 2596.065010]  [<ffffffff81696a58>] ret_from_fork+0x58/0x90
> [ 2596.065955]  [<ffffffff810b0630>] ? kthread_create_on_node+0x140/0x140
> [ 2596.066945] Code: e8 c3 2b d6 e0 e9 ca fa ff ff 4c 89 fa 48 c7 c6 07 d0
> 60 a0 48 c7 c7 50 24 61 a0 31 c0 e8 a6 2b d6 e0 e9 cd fa ff ff 0f 0b 0f 0b
> <0f> 0b 0f 0b 48 8b 83 c8 fc ff ff
>  4c 8b 89 c8 fc ff ff 4c 89 fa
> [ 2596.069127] RIP  [<ffffffffa05d96c3>] ceph_fill_trace+0x893/0xa00 [ceph]
> [ 2596.070120]  RSP <ffff880138d0fb80>
>
>
> Just before the above there are lots of messages similar to this from all
> ceph node ips:
> [  933.282441] [IPTABLES:INPUT] dropped IN=eno33557248 OUT=
> MAC=00:50:56:0f:9a:47:00:50:56:35:28:f1:08:00 SRC=192.168.5.6
> DST=192.168.3.2 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=20778 DF P
> ROTO=TCP SPT=6816 DPT=47140 WINDOW=2406 RES=0x00 ACK FIN URGP=0
> [  933.922440] [IPTABLES:INPUT] dropped IN=eno33557248 OUT=
> MAC=00:50:56:0f:9a:47:00:50:56:35:28:f1:08:00 SRC=192.168.5.6
> DST=192.168.3.2 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=1440 DF PR
> OTO=TCP SPT=6800 DPT=56290 WINDOW=2889 RES=0x00 ACK FIN URGP=0
> [  934.031555] [IPTABLES:INPUT] dropped IN=eno33557248 OUT=
> MAC=00:50:56:0f:9a:47:00:50:56:26:f3:39:08:00 SRC=192.168.5.7
> DST=192.168.3.2 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=58232 DF P
> ROTO=TCP SPT=6812 DPT=59564 WINDOW=8433 RES=0x00 ACK FIN URGP=0
> [  934.031579] [IPTABLES:INPUT] dropped IN=eno33557248 OUT=
> MAC=00:50:56:0f:9a:47:00:50:56:26:f3:39:08:00 SRC=192.168.5.7
> DST=192.168.3.2 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=20084 DF P
> ROTO=TCP SPT=6816 DPT=55574 WINDOW=2925 RES=0x00 ACK FIN URGP=0
> [  934.105440] [IPTABLES:INPUT] dropped IN=eno33557248 OUT=
> MAC=00:50:56:0f:9a:47:00:50:56:37:f8:4c:08:00 SRC=192.168.5.4
> DST=192.168.3.2 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=48428 DF P
> ROTO=TCP SPT=6804 DPT=59156 WINDOW=6422 RES=0x00 ACK FIN URGP=0
> [  935.133060] [IPTABLES:INPUT] dropped IN=eno33557248 OUT=
> MAC=00:50:56:0f:9a:47:00:50:56:0d:13:27:08:00 SRC=192.168.5.3
> DST=192.168.3.2 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=35384 DF P
> ROTO=TCP SPT=6817 DPT=52674 WINDOW=24576 RES=0x00 ACK FIN URGP=0
>
> Many thanks
>
> James
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux