One additional note, I've got a fair amount of data on the rbd volume, which I need to recover in one way or another. On Fri, Aug 1, 2014 at 2:41 PM, Christopher O'Connell <cjo at sendfaster.com> wrote: > So I've been having a seemingly similar problem and while trying to follow > the steps in this thread, things have gone very south for me. > > Kernal on OSDs and MONs: 2.6.32-431.20.3.0.1.el6.centos.plus.x86_64 #1 SMP > Wed Jul 16 21:27:52 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux > > Kernal on RBD host: 3.2.0-61-generic #93-Ubuntu SMP Fri May 2 21:31:50 UTC > 2014 x86_64 x86_64 x86_64 GNU/Linux > > All are running 0.80.5 > > I updated the tunables as per this article > http://cephnotes.ksperis.com/blog/2014/01/16/set-tunables-optimal-on-ceph-crushmap > > Here's what's happening: > > 1) On the rbd client node, trying to map rbd produces > $ sudo rbd --conf /etc/ceph/mia1.conf --keyring > /etc/ceph/mia1.client.admin.keyring map poolname > > rbd: add failed: (5) Input/output error > > Dmesg: > > [331172.147289] libceph: mon0 10.103.11.132:6789 feature set mismatch, my > 2 < server's 20042040002, missing 20042040000 > [331172.154059] libceph: mon0 10.103.11.132:6789 missing required > protocol features > [331182.176604] libceph: mon1 10.103.11.141:6789 feature set mismatch, my > 2 < server's 20042040002, missing 20042040000 > [331182.183535] libceph: mon1 10.103.11.141:6789 missing required > protocol features > [331192.192630] libceph: mon2 10.103.11.152:6789 feature set mismatch, my > 2 < server's 20042040002, missing 20042040000 > [331192.199810] libceph: mon2 10.103.11.152:6789 missing required > protocol features > [331202.209324] libceph: mon0 10.103.11.132:6789 feature set mismatch, my > 2 < server's 20042040002, missing 20042040000 > [331202.216957] libceph: mon0 10.103.11.132:6789 missing required > protocol features > [331212.224540] libceph: mon0 10.103.11.132:6789 feature set mismatch, my > 2 < server's 20042040002, missing 20042040000 > [331212.232276] libceph: mon0 10.103.11.132:6789 missing required > protocol features > [331222.240605] libceph: mon2 10.103.11.152:6789 feature set mismatch, my > 2 < server's 20042040002, missing 20042040000 > [331222.248660] libceph: mon2 10.103.11.152:6789 missing required > protocol features > > However, running > $ sudo rbd --conf /etc/ceph/mia1.conf --keyring > /etc/ceph/mia1.client.admin.keyring ls > poolname > > works fine and shows the expected pool name. > > 2) On the monitor where I ran the command to update the tunables, I can no > longer run the ceph console: > $ ceph -c /etc/ceph/mia1.conf --keyring /etc/ceph/mia1.client.admin.keyring > 2014-08-01 17:32:05.026960 7f21943d2700 0 -- 10.103.11.132:0/1030058 >> > 10.103.11.141:6789/0 pipe(0x7f2190028440 sd=3 :42360 s=1 pgs=0 cs=0 l=1 > c=0x7f21900286a0).connect protocol feature mismatch, my fffffffff < peer > 20fffffffff missing 20000000000 > 2014-08-01 17:32:05.027024 7f21943d2700 0 -- 10.103.11.132:0/1030058 >> > 10.103.11.141:6789/0 pipe(0x7f2190028440 sd=3 :42360 s=1 pgs=0 cs=0 l=1 > c=0x7f21900286a0).fault > 2014-08-01 17:32:05.027544 7f21943d2700 0 -- 10.103.11.132:0/1030058 >> > 10.103.11.141:6789/0 pipe(0x7f2190028440 sd=3 :42361 s=1 pgs=0 cs=0 l=1 > c=0x7f21900286a0).connect protocol feature mismatch, my fffffffff < peer > 20fffffffff missing 20000000000 > > and it just keeps spitting out a similar message. However I *can* run the > ceph console and execute basic commands (status, at the very least) from > other nodes. > > At this point, I'm reluctant to continue without some advice from someone > else. I can certainly try upgrading the kernal on the rbd client, but I'm > worried I may just make things worse. > > All the best, > > ~ Christopher > > > On Fri, Aug 1, 2014 at 1:34 PM, Larry Liu <larryliugml at gmail.com> wrote: > >> Hi Ilya, thank you sooooo much! I didn't know my crush map was all >> messed up. Now all is working! I guess it would have worked even without >> upgrading the kernel from 3.2 to 3.13. >> >> >> On Aug 1, 2014, at 12:48 PM, Ilya Dryomov <ilya.dryomov at inktank.com> >> wrote: >> >> > On Fri, Aug 1, 2014 at 10:32 PM, Larry Liu <larryliugml at gmail.com> >> wrote: >> >> cruhmap file is attached. I'm running kernel 3.13.0-29-generic after >> another person suggested. But the kernel upgrade didn't fix anything for >> me. Thanks! >> > >> > So there are two problems. First, you either have erasure pools or had >> > them in the past. Unfortunately there is currently a bug that prevents >> > kernel client from working in these circumstances even if you are >> > pointing it at "normal" replicated pools, such as rbd. Your options >> > are to either upgrade to kernel 3.14 or remove all erasure coded pools >> > and erasure rule. >> > >> > ceph osd pool delete foo >> > ceph osd pool delete bar >> > ceph osd crush rule rm erasure-code >> > >> > Regardless of whether you upgrade to 3.14 or choose to get rid of your >> > erasure pools you'll also have to do >> > >> > ceph osd getcrushmap -o /tmp/crush >> > crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new >> > ceph osd setcrushmap -i /tmp/crush.new >> > >> > to take care of the second problem. >> > >> > Thanks, >> > >> > Ilya >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140801/b6aa34d0/attachment.htm>