Re: mds segfault on cephfs snapshot creation

Brady Deetz <bdeetz@xxxxxxxxx> · Wed, 20 Apr 2016 10:52:59 -0500

On Wed, Apr 20, 2016 at 4:09 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
On Wed, Apr 20, 2016 at 12:12 PM, Brady Deetz <bdeetz@xxxxxxxxx> wrote:

> As soon as I create a snapshot on the root of my test cephfs deployment with

> a single file within the root, my mds server kernel panics. I understand

> that snapshots are not recommended. Is it beneficial to developers for me to

> leave my cluster in its present state and provide whatever debugging

> information they'd like? I'm not really looking for a solution to a mission

> critical issue as much as providing an opportunity for developers to pull

> stack traces, logs, etc from a system affected by some sort of bug in

> cephfs/mds. This happens every time I create a directory inside my .snap

> directory.

It's likely your kernel is too old for kernel mount. which version of

kernel do you use?

All nodes in the cluster share the versions listed below. This actually appears to be a cephfs client (native) issue (see stacktrace and kernel dump below). I have my fs mounted on my mds which is why I thought it was the mds causing a panic.

Linux mon0 3.13.0-77-generic #121-Ubuntu SMP Wed Jan 20 10:50:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

ceph-admin@mon0:~$ cat /etc/issue
Ubuntu 14.04.4 LTS \n \l

ceph-admin@mon0:~$ dpkg -l | grep ceph | tr -s ' ' | cut -d ' ' -f 2,3
ceph 0.80.11-0ubuntu1.14.04.1
ceph-common 0.80.11-0ubuntu1.14.04.1
ceph-deploy 1.4.0-0ubuntu1
ceph-fs-common 0.80.11-0ubuntu1.14.04.1
ceph-mds 0.80.11-0ubuntu1.14.04.1
libcephfs1 0.80.11-0ubuntu1.14.04.1
python-ceph 0.80.11-0ubuntu1.14.04.1

ceph-admin@mon0:~$ ceph status
    cluster 186408c3-df8a-4e46-a397-a788fc380039
     health HEALTH_OK
     monmap e1: 1 mons at {mon0=192.168.1.120:6789/0}, election epoch 1, quorum 0 mon0
     mdsmap e48: 1/1/1 up {0=mon0=up:active}
     osdmap e206: 15 osds: 15 up, 15 in
      pgmap v25298: 704 pgs, 5 pools, 123 MB data, 53 objects
            1648 MB used, 13964 GB / 13965 GB avail
                 704 active+clean

ceph-admin@mon0:~$ ceph osd tree
# id    weight  type name       up/down reweight
-1      13.65   root default
-2      2.73            host osd0
0       0.91                    osd.0   up      1
1       0.91                    osd.1   up      1
2       0.91                    osd.2   up      1
-3      2.73            host osd1
3       0.91                    osd.3   up      1
4       0.91                    osd.4   up      1
5       0.91                    osd.5   up      1
-4      2.73            host osd2
6       0.91                    osd.6   up      1
7       0.91                    osd.7   up      1
8       0.91                    osd.8   up      1
-5      2.73            host osd3
9       0.91                    osd.9   up      1
10      0.91                    osd.10  up      1
11      0.91                    osd.11  up      1
-6      2.73            host osd4
12      0.91                    osd.12  up      1
13      0.91                    osd.13  up      1
14      0.91                    osd.14  up      1

http://tech-hell.com/dump.201604201536

[ 5869.157340] ------------[ cut here ]------------
[ 5869.157527] kernel BUG at /build/linux-faWYrf/linux-3.13.0/fs/ceph/inode.c:928!
[ 5869.157797] invalid opcode: 0000 [#1] SMP
[ 5869.157977] Modules linked in: kvm_intel kvm serio_raw ceph libceph libcrc32c fscache psmouse floppy
[ 5869.158415] CPU: 0 PID: 46 Comm: kworker/0:1 Not tainted 3.13.0-77-generic #121-Ubuntu
[ 5869.158709] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 5869.158925] Workqueue: ceph-msgr con_work [libceph]
[ 5869.159124] task: ffff8809abf3c800 ti: ffff8809abf46000 task.ti: ffff8809abf46000
[ 5869.159422] RIP: 0010:[<ffffffffa009edd5>]  [<ffffffffa009edd5>] splice_dentry+0xd5/0x190 [ceph]
[ 5869.159768] RSP: 0018:ffff8809abf47b68  EFLAGS: 00010282
[ 5869.159963] RAX: 0000000000000004 RBX: ffff8809a08b2780 RCX: 0000000000000001
[ 5869.160224] RDX: 0000000000000000 RSI: ffff8809a04f8370 RDI: ffff8809a08b2780
[ 5869.160484] RBP: ffff8809abf47ba8 R08: ffff8809a982c400 R09: ffff8809a99ef6e8
[ 5869.160550] R10: 00000000000819d8 R11: 0000000000000000 R12: ffff8809a04f8370
[ 5869.160550] R13: ffff8809a08b2780 R14: ffff8809aad5fc00 R15: 0000000000000000
[ 5869.160550] FS:  0000000000000000(0000) GS:ffff8809e3c00000(0000) knlGS:0000000000000000
[ 5869.160550] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 5869.160550] CR2: 00007f60f37ff5c0 CR3: 00000009a5f63000 CR4: 00000000000006f0
[ 5869.160550] Stack:
[ 5869.160550]  ffff8809a5da1000 ffff8809aad5fc00 ffff8809a99ef408 ffff8809a99ef400
[ 5869.160550]  ffff8809a04f8370 ffff8809a08b2780 ffff8809aad5fc00 0000000000000000
[ 5869.160550]  ffff8809abf47c08 ffffffffa00a0dc7 ffff8809a982c544 ffff8809ab3f5400
[ 5869.160550] Call Trace:
[ 5869.160550]  [<ffffffffa00a0dc7>] ceph_fill_trace+0x2a7/0x770 [ceph]
[ 5869.160550]  [<ffffffffa00bb2c5>] handle_reply+0x3d5/0xc70 [ceph]
[ 5869.160550]  [<ffffffffa00bd437>] dispatch+0xe7/0xa90 [ceph]
[ 5869.160550]  [<ffffffffa0053a78>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph]
[ 5869.160550]  [<ffffffffa0056a9b>] try_read+0x4ab/0x10d0 [libceph]
[ 5869.160550]  [<ffffffff8104f28f>] ? kvm_clock_read+0x1f/0x30
[ 5869.160550]  [<ffffffff810a0685>] ? set_next_entity+0x95/0xb0
[ 5869.160550]  [<ffffffffa00588d9>] con_work+0xb9/0x640 [libceph]
[ 5869.160550]  [<ffffffff81083cd2>] process_one_work+0x182/0x450
[ 5869.160550]  [<ffffffff81084ac1>] worker_thread+0x121/0x410
[ 5869.160550]  [<ffffffff810849a0>] ? rescuer_thread+0x430/0x430
[ 5869.160550]  [<ffffffff8108b8a2>] kthread+0xd2/0xf0
[ 5869.160550]  [<ffffffff8108b7d0>] ? kthread_create_on_node+0x1c0/0x1c0
[ 5869.160550]  [<ffffffff81735c68>] ret_from_fork+0x58/0x90
[ 5869.160550]  [<ffffffff8108b7d0>] ? kthread_create_on_node+0x1c0/0x1c0
[ 5869.160550] Code: e7 e8 20 60 13 e1 eb c7 66 0f 1f 44 00 00 48 83 7b 78 00 0f 84 c2 00 00 00 f6 05 80 32 03 00 04 0f 85 83 00 00 00 49 89 dc eb 98 <0f> 0b 4d 8b 8e 98 fc ff ff 4d 8b 86 90 fc ff ff 48 89 c6 4c 89
[ 5869.160550] RIP  [<ffffffffa009edd5>] splice_dentry+0xd5/0x190 [ceph]
[ 5869.160550]  RSP <ffff8809abf47b68>

>

> Let me know if I should blow my cluster away?

>

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com