On Tue, May 20, 2014 at 7:52 AM, Jay Janardhan <jay.janardhan at kaseya.com> wrote: > Got the stack trace when it crashed. I had to enable serial port to capture > this. Would this help? > > [ 172.227318] libceph: mon0 192.168.56.102:6789 feature set mismatch, my > 40002 < server's 20042040002, missing 20042000000 > > [ 172.451109] libceph: mon0 192.168.56.102:6789 socket error on read > > [ 172.539837] ------------[ cut here ]------------ > > [ 172.640704] kernel BUG at /home/apw/COD/linux/net/ceph/messenger.c:2366! > > [ 172.740775] invalid opcode: 0000 [#1] SMP > > [ 172.805429] Modules linked in: rbd libceph libcrc32c nfsd nfs_acl > auth_rpcgss nfs fscache lockd sunrpc ext2 ppdev microcode psmouse serio_raw > parport_pc i2c_piix4 mac_hid lp parport e1000 > > [ 173.072985] CPU 0 > > [ 173.143909] Pid: 385, comm: kworker/0:3 Not tainted 3.6.9-030609-generic > #201212031610 innotek GmbH VirtualBox/VirtualBox > > [ 173.358836] RIP: 0010:[<ffffffffa0183ff7>] [<ffffffffa0183ff7>] > ceph_fault+0x267/0x270 [libceph] > > [ 173.629918] RSP: 0018:ffff88007b497d90 EFLAGS: 00010286 > > [ 173.731786] RAX: fffffffffffffffe RBX: ffff88007b909298 RCX: > 0000000000000003 > > [ 173.901361] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: > 0000000000000039 > > [ 174.040360] RBP: ffff88007b497dc0 R08: 000000000000000a R09: > 000000000000fffb > > [ 174.235587] R10: 0000000000000000 R11: 0000000000000199 R12: > ffff88007b9092c8 > > [ 174.385067] R13: 0000000000000000 R14: ffffffffa0199580 R15: > ffffffffa0195773 > > [ 174.541288] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) > knlGS:0000000000000000 > > [ 174.620856] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > [ 174.740551] CR2: 00007fefd16c5168 CR3: 000000007bb41000 CR4: > 00000000000006f0 > > [ 174.948095] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > > [ 175.076881] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > > [ 175.320731] Process kworker/0:3 (pid: 385, threadinfo ffff88007b496000, > task ffff880079735bc0) > > [ 175.565218] Stack: > > [ 175.630655] 0000000000000000 ffff88007b909298 ffff88007b909690 > ffff88007b9093d0 > > [ 175.699571] ffff88007b909418 ffff88007fc0e300 ffff88007b497df0 > ffffffffa018525c > > [ 175.710012] ffff88007b909690 ffff880078e4d800 ffff88007fc1bf00 > ffff88007fc0e340 > > [ 175.859748] Call Trace: > > [ 175.909572] [<ffffffffa018525c>] con_work+0x14c/0x1c0 [libceph] > > [ 176.010436] [<ffffffff810763b6>] process_one_work+0x136/0x550 > > [ 176.131098] [<ffffffffa0185110>] ? try_read+0x440/0x440 [libceph] > > [ 176.249904] [<ffffffff810775b5>] worker_thread+0x165/0x3c0 > > [ 176.368412] [<ffffffff81077450>] ? manage_workers+0x190/0x190 > > [ 176.512415] [<ffffffff8107c5e3>] kthread+0x93/0xa0 > > [ 176.623469] [<ffffffff816b8c04>] kernel_thread_helper+0x4/0x10 > > [ 176.670502] [<ffffffff8107c550>] ? flush_kthread_worker+0xb0/0xb0 > > [ 176.731089] [<ffffffff816b8c00>] ? gs_change+0x13/0x13 > > [ 176.901284] Code: 00 00 00 00 48 8b 83 38 01 00 00 a8 02 0f 85 f6 fe ff > ff 3e 80 a3 38 01 00 00 fb 48 c7 83 40 01 00 00 06 00 00 00 e9 37 ff ff ff > <0f> 0b 0f 0b 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 > > [ 177.088895] RIP [<ffffffffa0183ff7>] ceph_fault+0x267/0x270 [libceph] > > [ 177.251573] RSP <ffff88007b497d90> > > [ 177.310320] ---[ end trace f66ddfdda09b9821 ]--- OK, it definitely shouldn't have crashed here and there is a patch in later kernels that prevents this crash from happening. But, because 3.6 is too old and misses features, which is reported just prior to the crash splat, you wouldn't be able to use it with firefly userspace even if it didn't crash. You are going to need to run at least 3.9 and then disable vary_r tunable in your crushmap (vary_r will only be supported starting with 3.15) or primary-affinity adjustments - I can't tell which one is it just from the feature set mismatch message. Thanks, Ilya