Thanks again Ilya. I was following this recommendations: http://ceph.com/docs/master/start/os-recommendations/. Should this page be updated in that case? I'm going to upgrade to 3.9. Should I update the Ceph cluster nodes as well or just the ceph block device client? On Tue, May 20, 2014 at 3:20 AM, Ilya Dryomov <ilya.dryomov at inktank.com>wrote: > On Tue, May 20, 2014 at 7:52 AM, Jay Janardhan <jay.janardhan at kaseya.com> > wrote: > > Got the stack trace when it crashed. I had to enable serial port to > capture > > this. Would this help? > > > > [ 172.227318] libceph: mon0 192.168.56.102:6789 feature set mismatch, > my > > 40002 < server's 20042040002, missing 20042000000 > > > > [ 172.451109] libceph: mon0 192.168.56.102:6789 socket error on read > > > > [ 172.539837] ------------[ cut here ]------------ > > > > [ 172.640704] kernel BUG at > /home/apw/COD/linux/net/ceph/messenger.c:2366! > > > > [ 172.740775] invalid opcode: 0000 [#1] SMP > > > > [ 172.805429] Modules linked in: rbd libceph libcrc32c nfsd nfs_acl > > auth_rpcgss nfs fscache lockd sunrpc ext2 ppdev microcode psmouse > serio_raw > > parport_pc i2c_piix4 mac_hid lp parport e1000 > > > > [ 173.072985] CPU 0 > > > > [ 173.143909] Pid: 385, comm: kworker/0:3 Not tainted > 3.6.9-030609-generic > > #201212031610 innotek GmbH VirtualBox/VirtualBox > > > > [ 173.358836] RIP: 0010:[<ffffffffa0183ff7>] [<ffffffffa0183ff7>] > > ceph_fault+0x267/0x270 [libceph] > > > > [ 173.629918] RSP: 0018:ffff88007b497d90 EFLAGS: 00010286 > > > > [ 173.731786] RAX: fffffffffffffffe RBX: ffff88007b909298 RCX: > > 0000000000000003 > > > > [ 173.901361] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: > > 0000000000000039 > > > > [ 174.040360] RBP: ffff88007b497dc0 R08: 000000000000000a R09: > > 000000000000fffb > > > > [ 174.235587] R10: 0000000000000000 R11: 0000000000000199 R12: > > ffff88007b9092c8 > > > > [ 174.385067] R13: 0000000000000000 R14: ffffffffa0199580 R15: > > ffffffffa0195773 > > > > [ 174.541288] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) > > knlGS:0000000000000000 > > > > [ 174.620856] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > [ 174.740551] CR2: 00007fefd16c5168 CR3: 000000007bb41000 CR4: > > 00000000000006f0 > > > > [ 174.948095] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > 0000000000000000 > > > > [ 175.076881] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > 0000000000000400 > > > > [ 175.320731] Process kworker/0:3 (pid: 385, threadinfo > ffff88007b496000, > > task ffff880079735bc0) > > > > [ 175.565218] Stack: > > > > [ 175.630655] 0000000000000000 ffff88007b909298 ffff88007b909690 > > ffff88007b9093d0 > > > > [ 175.699571] ffff88007b909418 ffff88007fc0e300 ffff88007b497df0 > > ffffffffa018525c > > > > [ 175.710012] ffff88007b909690 ffff880078e4d800 ffff88007fc1bf00 > > ffff88007fc0e340 > > > > [ 175.859748] Call Trace: > > > > [ 175.909572] [<ffffffffa018525c>] con_work+0x14c/0x1c0 [libceph] > > > > [ 176.010436] [<ffffffff810763b6>] process_one_work+0x136/0x550 > > > > [ 176.131098] [<ffffffffa0185110>] ? try_read+0x440/0x440 [libceph] > > > > [ 176.249904] [<ffffffff810775b5>] worker_thread+0x165/0x3c0 > > > > [ 176.368412] [<ffffffff81077450>] ? manage_workers+0x190/0x190 > > > > [ 176.512415] [<ffffffff8107c5e3>] kthread+0x93/0xa0 > > > > [ 176.623469] [<ffffffff816b8c04>] kernel_thread_helper+0x4/0x10 > > > > [ 176.670502] [<ffffffff8107c550>] ? flush_kthread_worker+0xb0/0xb0 > > > > [ 176.731089] [<ffffffff816b8c00>] ? gs_change+0x13/0x13 > > > > [ 176.901284] Code: 00 00 00 00 48 8b 83 38 01 00 00 a8 02 0f 85 f6 fe > ff > > ff 3e 80 a3 38 01 00 00 fb 48 c7 83 40 01 00 00 06 00 00 00 e9 37 ff ff > ff > > <0f> 0b 0f 0b 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 > > > > [ 177.088895] RIP [<ffffffffa0183ff7>] ceph_fault+0x267/0x270 [libceph] > > > > [ 177.251573] RSP <ffff88007b497d90> > > > > [ 177.310320] ---[ end trace f66ddfdda09b9821 ]--- > > OK, it definitely shouldn't have crashed here and there is a patch in > later kernels that prevents this crash from happening. But, because > 3.6 is too old and misses features, which is reported just prior to the > crash splat, you wouldn't be able to use it with firefly userspace even > if it didn't crash. > > You are going to need to run at least 3.9 and then disable vary_r > tunable in your crushmap (vary_r will only be supported starting with > 3.15) or primary-affinity adjustments - I can't tell which one is it > just from the feature set mismatch message. > > Thanks, > > Ilya > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140520/51bf331a/attachment.htm>