On Sat, Aug 2, 2014 at 1:41 AM, Christopher O'Connell <cjo at sendfaster.com> wrote: > So I've been having a seemingly similar problem and while trying to follow > the steps in this thread, things have gone very south for me. Show me where in this thread have I said to set tunables to optimal ;) optimal (== firefly for firefly) is actually the opposite of what you are going to need. > > Kernal on OSDs and MONs: 2.6.32-431.20.3.0.1.el6.centos.plus.x86_64 #1 SMP > Wed Jul 16 21:27:52 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux > > Kernal on RBD host: 3.2.0-61-generic #93-Ubuntu SMP Fri May 2 21:31:50 UTC > 2014 x86_64 x86_64 x86_64 GNU/Linux > > All are running 0.80.5 Is this a new firefly cluster or was it created before firely (specifically before v0.78) and then upgraded? > > I updated the tunables as per this article > http://cephnotes.ksperis.com/blog/2014/01/16/set-tunables-optimal-on-ceph-crushmap > > Here's what's happening: > > 1) On the rbd client node, trying to map rbd produces > $ sudo rbd --conf /etc/ceph/mia1.conf --keyring > /etc/ceph/mia1.client.admin.keyring map poolname > > rbd: add failed: (5) Input/output error > > Dmesg: > > [331172.147289] libceph: mon0 10.103.11.132:6789 feature set mismatch, my 2 > < server's 20042040002, missing 20042040000 > [331172.154059] libceph: mon0 10.103.11.132:6789 missing required protocol > features > [331182.176604] libceph: mon1 10.103.11.141:6789 feature set mismatch, my 2 > < server's 20042040002, missing 20042040000 > [331182.183535] libceph: mon1 10.103.11.141:6789 missing required protocol > features > [331192.192630] libceph: mon2 10.103.11.152:6789 feature set mismatch, my 2 > < server's 20042040002, missing 20042040000 > [331192.199810] libceph: mon2 10.103.11.152:6789 missing required protocol > features > [331202.209324] libceph: mon0 10.103.11.132:6789 feature set mismatch, my 2 > < server's 20042040002, missing 20042040000 > [331202.216957] libceph: mon0 10.103.11.132:6789 missing required protocol > features > [331212.224540] libceph: mon0 10.103.11.132:6789 feature set mismatch, my 2 > < server's 20042040002, missing 20042040000 > [331212.232276] libceph: mon0 10.103.11.132:6789 missing required protocol > features > [331222.240605] libceph: mon2 10.103.11.152:6789 feature set mismatch, my 2 > < server's 20042040002, missing 20042040000 > [331222.248660] libceph: mon2 10.103.11.152:6789 missing required protocol > features > > However, running > $ sudo rbd --conf /etc/ceph/mia1.conf --keyring > /etc/ceph/mia1.client.admin.keyring ls > poolname > > works fine and shows the expected pool name. > > 2) On the monitor where I ran the command to update the tunables, I can no > longer run the ceph console: > $ ceph -c /etc/ceph/mia1.conf --keyring /etc/ceph/mia1.client.admin.keyring > 2014-08-01 17:32:05.026960 7f21943d2700 0 -- 10.103.11.132:0/1030058 >> > 10.103.11.141:6789/0 pipe(0x7f2190028440 sd=3 :42360 s=1 pgs=0 cs=0 l=1 > c=0x7f21900286a0).connect protocol feature mismatch, my fffffffff < peer > 20fffffffff missing 20000000000 > 2014-08-01 17:32:05.027024 7f21943d2700 0 -- 10.103.11.132:0/1030058 >> > 10.103.11.141:6789/0 pipe(0x7f2190028440 sd=3 :42360 s=1 pgs=0 cs=0 l=1 > c=0x7f21900286a0).fault > 2014-08-01 17:32:05.027544 7f21943d2700 0 -- 10.103.11.132:0/1030058 >> > 10.103.11.141:6789/0 pipe(0x7f2190028440 sd=3 :42361 s=1 pgs=0 cs=0 l=1 > c=0x7f21900286a0).connect protocol feature mismatch, my fffffffff < peer > 20fffffffff missing 20000000000 > > and it just keeps spitting out a similar message. However I *can* run the > ceph console and execute basic commands (status, at the very least) from > other nodes. What does ceph -s from those other nodes say? Check versions of all monitors with ceph daemon mon.<id> version > > At this point, I'm reluctant to continue without some advice from someone > else. I can certainly try upgrading the kernal on the rbd client, but I'm > worried I may just make things worse. Upgrading the kernel won't make things worse, it's just a client. I'm pretty sure we can make this work with 3.2, but if you actually plan on using krbd for anything serious, I'd recommend an upgrade to 3.14. 3.13 will do too, if you don't plan on having any erasure pools in your cluster. Thanks, Ilya