Got the stack trace when it crashed. I had to enable serial port to capture this. Would this help? [ 172.227318] libceph: mon0 192.168.56.102:6789 feature set mismatch, my 40002 < server's 20042040002, missing 20042000000 [ 172.451109] libceph: mon0 192.168.56.102:6789 socket error on read [ 172.539837] ------------[ cut here ]------------ [ 172.640704] kernel BUG at /home/apw/COD/linux/net/ceph/messenger.c:2366! [ 172.740775] invalid opcode: 0000 [#1] SMP [ 172.805429] Modules linked in: rbd libceph libcrc32c nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc ext2 ppdev microcode psmouse serio_raw parport_pc i2c_piix4 mac_hid lp parport e1000 [ 173.072985] CPU 0 [ 173.143909] Pid: 385, comm: kworker/0:3 Not tainted 3.6.9-030609-generic #201212031610 innotek GmbH VirtualBox/VirtualBox [ 173.358836] RIP: 0010:[<ffffffffa0183ff7>] [<ffffffffa0183ff7>] ceph_fault+0x267/0x270 [libceph] [ 173.629918] RSP: 0018:ffff88007b497d90 EFLAGS: 00010286 [ 173.731786] RAX: fffffffffffffffe RBX: ffff88007b909298 RCX: 0000000000000003 [ 173.901361] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000039 [ 174.040360] RBP: ffff88007b497dc0 R08: 000000000000000a R09: 000000000000fffb [ 174.235587] R10: 0000000000000000 R11: 0000000000000199 R12: ffff88007b9092c8 [ 174.385067] R13: 0000000000000000 R14: ffffffffa0199580 R15: ffffffffa0195773 [ 174.541288] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 174.620856] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 174.740551] CR2: 00007fefd16c5168 CR3: 000000007bb41000 CR4: 00000000000006f0 [ 174.948095] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 175.076881] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 175.320731] Process kworker/0:3 (pid: 385, threadinfo ffff88007b496000, task ffff880079735bc0) [ 175.565218] Stack: [ 175.630655] 0000000000000000 ffff88007b909298 ffff88007b909690 ffff88007b9093d0 [ 175.699571] ffff88007b909418 ffff88007fc0e300 ffff88007b497df0 ffffffffa018525c [ 175.710012] ffff88007b909690 ffff880078e4d800 ffff88007fc1bf00 ffff88007fc0e340 [ 175.859748] Call Trace: [ 175.909572] [<ffffffffa018525c>] con_work+0x14c/0x1c0 [libceph] [ 176.010436] [<ffffffff810763b6>] process_one_work+0x136/0x550 [ 176.131098] [<ffffffffa0185110>] ? try_read+0x440/0x440 [libceph] [ 176.249904] [<ffffffff810775b5>] worker_thread+0x165/0x3c0 [ 176.368412] [<ffffffff81077450>] ? manage_workers+0x190/0x190 [ 176.512415] [<ffffffff8107c5e3>] kthread+0x93/0xa0 [ 176.623469] [<ffffffff816b8c04>] kernel_thread_helper+0x4/0x10 [ 176.670502] [<ffffffff8107c550>] ? flush_kthread_worker+0xb0/0xb0 [ 176.731089] [<ffffffff816b8c00>] ? gs_change+0x13/0x13 [ 176.901284] Code: 00 00 00 00 48 8b 83 38 01 00 00 a8 02 0f 85 f6 fe ff ff 3e 80 a3 38 01 00 00 fb 48 c7 83 40 01 00 00 06 00 00 00 e9 37 ff ff ff <0f> 0b 0f 0b 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 [ 177.088895] RIP [<ffffffffa0183ff7>] ceph_fault+0x267/0x270 [libceph] [ 177.251573] RSP <ffff88007b497d90> [ 177.310320] ---[ end trace f66ddfdda09b9821 ]--- [ 177.461430] BUG: unable to handle kernel paging request at fffffffffffffff8 [ 177.464615] IP: [<ffffffff8107c8b1>] kthread_data+0x11/0x20 [ 177.464615] PGD 1c0e067 PUD 1c0f067 PMD 0 [ 177.464615] Oops: 0000 [#2] SMP [ 177.464615] Modules linked in: rbd libceph libcrc32c nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc ext2 ppdev microcode psmouse serio_raw parport_pc i2c_piix4 mac_hid lp parport e1000 [ 177.464615] CPU 0 [ 177.464615] Pid: 385, comm: kworker/0:3 Tainted: G D 3.6.9-030609-generic #201212031610 innotek GmbH VirtualBox/VirtualBox [ 177.464615] RIP: 0010:[<ffffffff8107c8b1>] [<ffffffff8107c8b1>] kthread_data+0x11/0x20 [ 177.464615] RSP: 0018:ffff88007b497a70 EFLAGS: 00010096 [ 177.464615] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 177.464615] RDX: ffffffff81e593c0 RSI: 0000000000000000 RDI: ffff880079735bc0 [ 177.464615] RBP: ffff88007b497a88 R08: 0000000000989680 R09: 0000000000000400 [ 177.464615] R10: 0000000000000000 R11: ffff880078fb09e0 R12: 0000000000000000 [ 177.464615] R13: ffff880079735f90 R14: 0000000000000001 R15: 0000000000000006 [ 177.464615] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 177.464615] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 177.464615] CR2: fffffffffffffff8 CR3: 000000007b73e000 CR4: 00000000000006f0 [ 177.464615] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 177.464615] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 177.464615] Process kworker/0:3 (pid: 385, threadinfo ffff88007b496000, task ffff880079735bc0) [ 177.464615] Stack: [ 177.464615] ffffffff81077dc5 ffff88007b497a88 ffff88007fc13dc0 ffff88007b497b08 [ 177.464615] ffffffff816ade3f ffff88007b497ab8 0000000000000000 ffff88007b497fd8 [ 177.464615] ffff88007b497fd8 ffff88007b497fd8 0000000000013dc0 ffff880078d8d618 [ 177.464615] Call Trace: [ 177.464615] [<ffffffff81077dc5>] ? wq_worker_sleeping+0x15/0xc0 [ 177.464615] [<ffffffff816ade3f>] __schedule+0x5cf/0x6f0 [ 177.464615] [<ffffffff816ae279>] schedule+0x29/0x70 [ 177.464615] [<ffffffff8105d793>] do_exit+0x2b3/0x470 [ 177.464615] [<ffffffff816b04a0>] oops_end+0xb0/0xf0 [ 177.464615] [<ffffffff81017c78>] die+0x58/0x90 [ 177.464615] [<ffffffff816afd94>] do_trap+0xc4/0x170 [ 177.464615] [<ffffffff81015385>] do_invalid_op+0x95/0xb0 [ 177.464615] [<ffffffffa0183ff7>] ? ceph_fault+0x267/0x270 [libceph] [ 177.464615] [<ffffffff81340fd1>] ? vsnprintf+0x461/0x600 [ 177.464615] [<ffffffff816b8a7b>] invalid_op+0x1b/0x20 [ 177.464615] [<ffffffffa0183ff7>] ? ceph_fault+0x267/0x270 [libceph] [ 177.464615] [<ffffffffa018525c>] con_work+0x14c/0x1c0 [libceph] [ 177.464615] [<ffffffff810763b6>] process_one_work+0x136/0x550 [ 177.464615] [<ffffffffa0185110>] ? try_read+0x440/0x440 [libceph] [ 177.464615] [<ffffffff810775b5>] worker_thread+0x165/0x3c0 [ 177.464615] [<ffffffff81077450>] ? manage_workers+0x190/0x190 [ 177.464615] [<ffffffff8107c5e3>] kthread+0x93/0xa0 [ 177.464615] [<ffffffff816b8c04>] kernel_thread_helper+0x4/0x10 [ 177.464615] [<ffffffff8107c550>] ? flush_kthread_worker+0xb0/0xb0 [ 177.464615] [<ffffffff816b8c00>] ? gs_change+0x13/0x13 [ 177.464615] Code: ff ff eb 88 be 57 01 00 00 48 c7 c7 38 3f a2 81 e8 75 a6 fd ff e9 b4 fe ff ff 55 48 89 e5 0f 1f 44 00 00 48 8b 87 78 03 00 00 5d <48> 8b 40 f8 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f [ 177.464615] RIP [<ffffffff8107c8b1>] kthread_data+0x11/0x20 [ 177.464615] RSP <ffff88007b497a70> [ 177.464615] CR2: fffffffffffffff8 [ 177.464615] ---[ end trace f66ddfdda09b9822 ]--- [ 177.464615] Fixing recursive fault but reboot is needed! On Mon, May 19, 2014 at 1:26 PM, Ilya Dryomov <ilya.dryomov at inktank.com>wrote: > On Mon, May 19, 2014 at 8:37 PM, Jay Janardhan <jay.janardhan at kaseya.com> > wrote: > > Ilya, The SysRq is not doing anything as the kernel is hung. Btw, this > is a > > VirtualBox environment so I used the VBoxManage to send the SysRq > commands. > > Just to let you know, the system locksup and the only way out is a hard > > reset. > > Well, that's not much to go on. Was there something in dmesg when it > locked up or in response to SysRqs? > > Thanks, > > Ilya > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140519/1e0700f0/attachment.htm>