Fwd: "rbd map" command hangs

jay.janardhan@xxxxxxxxxx (Jay Janardhan) · Tue, 20 May 2014 09:09:44 -0400

Thanks again Ilya.

I was following this recommendations:
http://ceph.com/docs/master/start/os-recommendations/. Should this page be
updated in that case?

I'm going to upgrade to 3.9. Should I update the Ceph cluster nodes as well
or just the ceph block device client?

On Tue, May 20, 2014 at 3:20 AM, Ilya Dryomov <ilya.dryomov at inktank.com>wrote:

> On Tue, May 20, 2014 at 7:52 AM, Jay Janardhan <jay.janardhan at kaseya.com>
> wrote:
> > Got the stack trace when it crashed. I had to enable serial port to
> capture
> > this. Would this help?
> >
> > [  172.227318] libceph: mon0 192.168.56.102:6789 feature set mismatch,
> my
> > 40002 < server's 20042040002, missing 20042000000
> >
> > [  172.451109] libceph: mon0 192.168.56.102:6789 socket error on read
> >
> > [  172.539837] ------------[ cut here ]------------
> >
> > [  172.640704] kernel BUG at
> /home/apw/COD/linux/net/ceph/messenger.c:2366!
> >
> > [  172.740775] invalid opcode: 0000 [#1] SMP
> >
> > [  172.805429] Modules linked in: rbd libceph libcrc32c nfsd nfs_acl
> > auth_rpcgss nfs fscache lockd sunrpc ext2 ppdev microcode psmouse
> serio_raw
> > parport_pc i2c_piix4 mac_hid lp parport e1000
> >
> > [  173.072985] CPU 0
> >
> > [  173.143909] Pid: 385, comm: kworker/0:3 Not tainted
> 3.6.9-030609-generic
> > #201212031610 innotek GmbH VirtualBox/VirtualBox
> >
> > [  173.358836] RIP: 0010:[<ffffffffa0183ff7>]  [<ffffffffa0183ff7>]
> > ceph_fault+0x267/0x270 [libceph]
> >
> > [  173.629918] RSP: 0018:ffff88007b497d90  EFLAGS: 00010286
> >
> > [  173.731786] RAX: fffffffffffffffe RBX: ffff88007b909298 RCX:
> > 0000000000000003
> >
> > [  173.901361] RDX: 0000000000000000 RSI: 00000000ffffffff RDI:
> > 0000000000000039
> >
> > [  174.040360] RBP: ffff88007b497dc0 R08: 000000000000000a R09:
> > 000000000000fffb
> >
> > [  174.235587] R10: 0000000000000000 R11: 0000000000000199 R12:
> > ffff88007b9092c8
> >
> > [  174.385067] R13: 0000000000000000 R14: ffffffffa0199580 R15:
> > ffffffffa0195773
> >
> > [  174.541288] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000)
> > knlGS:0000000000000000
> >
> > [  174.620856] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >
> > [  174.740551] CR2: 00007fefd16c5168 CR3: 000000007bb41000 CR4:
> > 00000000000006f0
> >
> > [  174.948095] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> >
> > [  175.076881] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > 0000000000000400
> >
> > [  175.320731] Process kworker/0:3 (pid: 385, threadinfo
> ffff88007b496000,
> > task ffff880079735bc0)
> >
> > [  175.565218] Stack:
> >
> > [  175.630655]  0000000000000000 ffff88007b909298 ffff88007b909690
> > ffff88007b9093d0
> >
> > [  175.699571]  ffff88007b909418 ffff88007fc0e300 ffff88007b497df0
> > ffffffffa018525c
> >
> > [  175.710012]  ffff88007b909690 ffff880078e4d800 ffff88007fc1bf00
> > ffff88007fc0e340
> >
> > [  175.859748] Call Trace:
> >
> > [  175.909572]  [<ffffffffa018525c>] con_work+0x14c/0x1c0 [libceph]
> >
> > [  176.010436]  [<ffffffff810763b6>] process_one_work+0x136/0x550
> >
> > [  176.131098]  [<ffffffffa0185110>] ? try_read+0x440/0x440 [libceph]
> >
> > [  176.249904]  [<ffffffff810775b5>] worker_thread+0x165/0x3c0
> >
> > [  176.368412]  [<ffffffff81077450>] ? manage_workers+0x190/0x190
> >
> > [  176.512415]  [<ffffffff8107c5e3>] kthread+0x93/0xa0
> >
> > [  176.623469]  [<ffffffff816b8c04>] kernel_thread_helper+0x4/0x10
> >
> > [  176.670502]  [<ffffffff8107c550>] ? flush_kthread_worker+0xb0/0xb0
> >
> > [  176.731089]  [<ffffffff816b8c00>] ? gs_change+0x13/0x13
> >
> > [  176.901284] Code: 00 00 00 00 48 8b 83 38 01 00 00 a8 02 0f 85 f6 fe
> ff
> > ff 3e 80 a3 38 01 00 00 fb 48 c7 83 40 01 00 00 06 00 00 00 e9 37 ff ff
> ff
> > <0f> 0b 0f 0b 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8
> >
> > [  177.088895] RIP  [<ffffffffa0183ff7>] ceph_fault+0x267/0x270 [libceph]
> >
> > [  177.251573]  RSP <ffff88007b497d90>
> >
> > [  177.310320] ---[ end trace f66ddfdda09b9821 ]---
>
> OK, it definitely shouldn't have crashed here and there is a patch in
> later kernels that prevents this crash from happening.  But, because
> 3.6 is too old and misses features, which is reported just prior to the
> crash splat, you wouldn't be able to use it with firefly userspace even
> if it didn't crash.
>
> You are going to need to run at least 3.9 and then disable vary_r
> tunable in your crushmap (vary_r will only be supported starting with
> 3.15) or primary-affinity adjustments - I can't tell which one is it
> just from the feature set mismatch message.
>
> Thanks,
>
>                 Ilya
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140520/51bf331a/attachment.htm>