Re: mds segfault on cephfs snapshot creation

"Yan, Zheng" <ukernel@xxxxxxxxx> · Thu, 21 Apr 2016 11:33:06 +0800

On Wed, Apr 20, 2016 at 11:52 PM, Brady Deetz <bdeetz@xxxxxxxxx> wrote:
>
>
> On Wed, Apr 20, 2016 at 4:09 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>
>> On Wed, Apr 20, 2016 at 12:12 PM, Brady Deetz <bdeetz@xxxxxxxxx> wrote:
>> > As soon as I create a snapshot on the root of my test cephfs deployment
>> > with
>> > a single file within the root, my mds server kernel panics. I understand
>> > that snapshots are not recommended. Is it beneficial to developers for
>> > me to
>> > leave my cluster in its present state and provide whatever debugging
>> > information they'd like? I'm not really looking for a solution to a
>> > mission
>> > critical issue as much as providing an opportunity for developers to
>> > pull
>> > stack traces, logs, etc from a system affected by some sort of bug in
>> > cephfs/mds. This happens every time I create a directory inside my .snap
>> > directory.
>>
>> It's likely your kernel is too old for kernel mount. which version of
>> kernel do you use?
>
>
> All nodes in the cluster share the versions listed below. This actually
> appears to be a cephfs client (native) issue (see stacktrace and kernel dump
> below). I have my fs mounted on my mds which is why I thought it was the mds
> causing a panic.
>
> Linux mon0 3.13.0-77-generic #121-Ubuntu SMP Wed Jan 20 10:50:42 UTC 2016
> x86_64 x86_64 x86_64 GNU/Linux

Please use 4.x kernel. Beside ceph-mds 0.80 is too old for using
snapshot. creating snapshot can cause various issues.

Regards
Yan, Zheng

>
> ceph-admin@mon0:~$ cat /etc/issue
> Ubuntu 14.04.4 LTS \n \l
>
> ceph-admin@mon0:~$ dpkg -l | grep ceph | tr -s ' ' | cut -d ' ' -f 2,3
> ceph 0.80.11-0ubuntu1.14.04.1
> ceph-common 0.80.11-0ubuntu1.14.04.1
> ceph-deploy 1.4.0-0ubuntu1
> ceph-fs-common 0.80.11-0ubuntu1.14.04.1
> ceph-mds 0.80.11-0ubuntu1.14.04.1
> libcephfs1 0.80.11-0ubuntu1.14.04.1
> python-ceph 0.80.11-0ubuntu1.14.04.1
>
>
> ceph-admin@mon0:~$ ceph status
>     cluster 186408c3-df8a-4e46-a397-a788fc380039
>      health HEALTH_OK
>      monmap e1: 1 mons at {mon0=192.168.1.120:6789/0}, election epoch 1,
> quorum 0 mon0
>      mdsmap e48: 1/1/1 up {0=mon0=up:active}
>      osdmap e206: 15 osds: 15 up, 15 in
>       pgmap v25298: 704 pgs, 5 pools, 123 MB data, 53 objects
>             1648 MB used, 13964 GB / 13965 GB avail
>                  704 active+clean
>
>
> ceph-admin@mon0:~$ ceph osd tree
> # id    weight  type name       up/down reweight
> -1      13.65   root default
> -2      2.73            host osd0
> 0       0.91                    osd.0   up      1
> 1       0.91                    osd.1   up      1
> 2       0.91                    osd.2   up      1
> -3      2.73            host osd1
> 3       0.91                    osd.3   up      1
> 4       0.91                    osd.4   up      1
> 5       0.91                    osd.5   up      1
> -4      2.73            host osd2
> 6       0.91                    osd.6   up      1
> 7       0.91                    osd.7   up      1
> 8       0.91                    osd.8   up      1
> -5      2.73            host osd3
> 9       0.91                    osd.9   up      1
> 10      0.91                    osd.10  up      1
> 11      0.91                    osd.11  up      1
> -6      2.73            host osd4
> 12      0.91                    osd.12  up      1
> 13      0.91                    osd.13  up      1
> 14      0.91                    osd.14  up      1
>
>
> http://tech-hell.com/dump.201604201536
>
> [ 5869.157340] ------------[ cut here ]------------
> [ 5869.157527] kernel BUG at
> /build/linux-faWYrf/linux-3.13.0/fs/ceph/inode.c:928!
> [ 5869.157797] invalid opcode: 0000 [#1] SMP
> [ 5869.157977] Modules linked in: kvm_intel kvm serio_raw ceph libceph
> libcrc32c fscache psmouse floppy
> [ 5869.158415] CPU: 0 PID: 46 Comm: kworker/0:1 Not tainted
> 3.13.0-77-generic #121-Ubuntu
> [ 5869.158709] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> [ 5869.158925] Workqueue: ceph-msgr con_work [libceph]
> [ 5869.159124] task: ffff8809abf3c800 ti: ffff8809abf46000 task.ti:
> ffff8809abf46000
> [ 5869.159422] RIP: 0010:[<ffffffffa009edd5>]  [<ffffffffa009edd5>]
> splice_dentry+0xd5/0x190 [ceph]
> [ 5869.159768] RSP: 0018:ffff8809abf47b68  EFLAGS: 00010282
> [ 5869.159963] RAX: 0000000000000004 RBX: ffff8809a08b2780 RCX:
> 0000000000000001
> [ 5869.160224] RDX: 0000000000000000 RSI: ffff8809a04f8370 RDI:
> ffff8809a08b2780
> [ 5869.160484] RBP: ffff8809abf47ba8 R08: ffff8809a982c400 R09:
> ffff8809a99ef6e8
> [ 5869.160550] R10: 00000000000819d8 R11: 0000000000000000 R12:
> ffff8809a04f8370
> [ 5869.160550] R13: ffff8809a08b2780 R14: ffff8809aad5fc00 R15:
> 0000000000000000
> [ 5869.160550] FS:  0000000000000000(0000) GS:ffff8809e3c00000(0000)
> knlGS:0000000000000000
> [ 5869.160550] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 5869.160550] CR2: 00007f60f37ff5c0 CR3: 00000009a5f63000 CR4:
> 00000000000006f0
> [ 5869.160550] Stack:
> [ 5869.160550]  ffff8809a5da1000 ffff8809aad5fc00 ffff8809a99ef408
> ffff8809a99ef400
> [ 5869.160550]  ffff8809a04f8370 ffff8809a08b2780 ffff8809aad5fc00
> 0000000000000000
> [ 5869.160550]  ffff8809abf47c08 ffffffffa00a0dc7 ffff8809a982c544
> ffff8809ab3f5400
> [ 5869.160550] Call Trace:
> [ 5869.160550]  [<ffffffffa00a0dc7>] ceph_fill_trace+0x2a7/0x770 [ceph]
> [ 5869.160550]  [<ffffffffa00bb2c5>] handle_reply+0x3d5/0xc70 [ceph]
> [ 5869.160550]  [<ffffffffa00bd437>] dispatch+0xe7/0xa90 [ceph]
> [ 5869.160550]  [<ffffffffa0053a78>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph]
> [ 5869.160550]  [<ffffffffa0056a9b>] try_read+0x4ab/0x10d0 [libceph]
> [ 5869.160550]  [<ffffffff8104f28f>] ? kvm_clock_read+0x1f/0x30
> [ 5869.160550]  [<ffffffff810a0685>] ? set_next_entity+0x95/0xb0
> [ 5869.160550]  [<ffffffffa00588d9>] con_work+0xb9/0x640 [libceph]
> [ 5869.160550]  [<ffffffff81083cd2>] process_one_work+0x182/0x450
> [ 5869.160550]  [<ffffffff81084ac1>] worker_thread+0x121/0x410
> [ 5869.160550]  [<ffffffff810849a0>] ? rescuer_thread+0x430/0x430
> [ 5869.160550]  [<ffffffff8108b8a2>] kthread+0xd2/0xf0
> [ 5869.160550]  [<ffffffff8108b7d0>] ? kthread_create_on_node+0x1c0/0x1c0
> [ 5869.160550]  [<ffffffff81735c68>] ret_from_fork+0x58/0x90
> [ 5869.160550]  [<ffffffff8108b7d0>] ? kthread_create_on_node+0x1c0/0x1c0
> [ 5869.160550] Code: e7 e8 20 60 13 e1 eb c7 66 0f 1f 44 00 00 48 83 7b 78
> 00 0f 84 c2 00 00 00 f6 05 80 32 03 00 04 0f 85 83 00 00 00 49 89 dc eb 98
> <0f> 0b 4d 8b 8e 98 fc ff ff 4d 8b 86 90 fc ff ff 48 89 c6 4c 89
> [ 5869.160550] RIP  [<ffffffffa009edd5>] splice_dentry+0xd5/0x190 [ceph]
> [ 5869.160550]  RSP <ffff8809abf47b68>
>
>
>
>>
>>
>>
>>
>> >
>> > Let me know if I should blow my cluster away?
>> >
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com