To be more clear on my question, we currently use ELRepo for those rare occasions when we need a 3.x kernel on centos. Are you aware of anyone maintaining a 3.14 kernel. On Sat, Aug 2, 2014 at 3:01 PM, Christopher O'Connell <cjo at sendfaster.com> wrote: > Hi Ilya, > > Short of building a 3.14 kernel from scratch, are there any centos/EL > kernels of 3.14 but less than 3.15? > > Is the fix in 3.15 yet? I just installed 3.15.8. > > All the best, > > ~ Christopher > > Al > > > On Sat, Aug 2, 2014 at 11:20 AM, Ilya Dryomov <ilya.dryomov at inktank.com> > wrote: > >> On Sat, Aug 2, 2014 at 10:03 PM, Christopher O'Connell >> <cjo at sendfaster.com> wrote: >> > On Sat, Aug 2, 2014 at 6:27 AM, Ilya Dryomov <ilya.dryomov at inktank.com> >> > wrote: >> >> >> >> On Sat, Aug 2, 2014 at 1:41 AM, Christopher O'Connell >> >> <cjo at sendfaster.com> wrote: >> >> > So I've been having a seemingly similar problem and while trying to >> >> > follow >> >> > the steps in this thread, things have gone very south for me. >> >> >> >> Show me where in this thread have I said to set tunables to optimal ;) >> >> optimal (== firefly for firefly) is actually the opposite of what you >> >> are going to need. >> > >> > >> > So what should tunables be set to? Optimal? >> >> Ordinarily yes, but not if you are going to use older kernels. In that >> case you'd want "default" or "legacy". >> >> > >> >> >> >> >> >> > >> >> > Kernal on OSDs and MONs: 2.6.32-431.20.3.0.1.el6.centos.plus.x86_64 >> #1 >> >> > SMP >> >> > Wed Jul 16 21:27:52 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux >> >> > >> >> > Kernal on RBD host: 3.2.0-61-generic #93-Ubuntu SMP Fri May 2 >> 21:31:50 >> >> > UTC >> >> > 2014 x86_64 x86_64 x86_64 GNU/Linux >> >> > >> >> > All are running 0.80.5 >> >> >> >> Is this a new firefly cluster or was it created before firely >> >> (specifically before v0.78) and then upgraded? >> > >> > >> > It was created before 0.78 and upgraded. It has also been expanded >> several >> > times. >> > >> >> >> >> >> >> > >> >> > I updated the tunables as per this article >> >> > >> >> > >> http://cephnotes.ksperis.com/blog/2014/01/16/set-tunables-optimal-on-ceph-crushmap >> >> > >> >> > Here's what's happening: >> >> > >> >> > 1) On the rbd client node, trying to map rbd produces >> >> > $ sudo rbd --conf /etc/ceph/mia1.conf --keyring >> >> > /etc/ceph/mia1.client.admin.keyring map poolname >> >> > >> >> > rbd: add failed: (5) Input/output error >> >> > >> >> > Dmesg: >> >> > >> >> > [331172.147289] libceph: mon0 10.103.11.132:6789 feature set >> mismatch, >> >> > my 2 >> >> > < server's 20042040002, missing 20042040000 >> >> > [331172.154059] libceph: mon0 10.103.11.132:6789 missing required >> >> > protocol >> >> > features >> >> > [331182.176604] libceph: mon1 10.103.11.141:6789 feature set >> mismatch, >> >> > my 2 >> >> > < server's 20042040002, missing 20042040000 >> >> > [331182.183535] libceph: mon1 10.103.11.141:6789 missing required >> >> > protocol >> >> > features >> >> > [331192.192630] libceph: mon2 10.103.11.152:6789 feature set >> mismatch, >> >> > my 2 >> >> > < server's 20042040002, missing 20042040000 >> >> > [331192.199810] libceph: mon2 10.103.11.152:6789 missing required >> >> > protocol >> >> > features >> >> > [331202.209324] libceph: mon0 10.103.11.132:6789 feature set >> mismatch, >> >> > my 2 >> >> > < server's 20042040002, missing 20042040000 >> >> > [331202.216957] libceph: mon0 10.103.11.132:6789 missing required >> >> > protocol >> >> > features >> >> > [331212.224540] libceph: mon0 10.103.11.132:6789 feature set >> mismatch, >> >> > my 2 >> >> > < server's 20042040002, missing 20042040000 >> >> > [331212.232276] libceph: mon0 10.103.11.132:6789 missing required >> >> > protocol >> >> > features >> >> > [331222.240605] libceph: mon2 10.103.11.152:6789 feature set >> mismatch, >> >> > my 2 >> >> > < server's 20042040002, missing 20042040000 >> >> > [331222.248660] libceph: mon2 10.103.11.152:6789 missing required >> >> > protocol >> >> > features >> >> > >> >> > However, running >> >> > $ sudo rbd --conf /etc/ceph/mia1.conf --keyring >> >> > /etc/ceph/mia1.client.admin.keyring ls >> >> > poolname >> >> > >> >> > works fine and shows the expected pool name. >> >> > >> >> > 2) On the monitor where I ran the command to update the tunables, I >> can >> >> > no >> >> > longer run the ceph console: >> >> > $ ceph -c /etc/ceph/mia1.conf --keyring >> >> > /etc/ceph/mia1.client.admin.keyring >> >> > 2014-08-01 17:32:05.026960 7f21943d2700 0 -- >> 10.103.11.132:0/1030058 >> >> >> > 10.103.11.141:6789/0 pipe(0x7f2190028440 sd=3 :42360 s=1 pgs=0 cs=0 >> l=1 >> >> > c=0x7f21900286a0).connect protocol feature mismatch, my fffffffff < >> peer >> >> > 20fffffffff missing 20000000000 >> >> > 2014-08-01 17:32:05.027024 7f21943d2700 0 -- >> 10.103.11.132:0/1030058 >> >> >> > 10.103.11.141:6789/0 pipe(0x7f2190028440 sd=3 :42360 s=1 pgs=0 cs=0 >> l=1 >> >> > c=0x7f21900286a0).fault >> >> > 2014-08-01 17:32:05.027544 7f21943d2700 0 -- >> 10.103.11.132:0/1030058 >> >> >> > 10.103.11.141:6789/0 pipe(0x7f2190028440 sd=3 :42361 s=1 pgs=0 cs=0 >> l=1 >> >> > c=0x7f21900286a0).connect protocol feature mismatch, my fffffffff < >> peer >> >> > 20fffffffff missing 20000000000 >> >> > >> >> > and it just keeps spitting out a similar message. However I *can* run >> >> > the >> >> > ceph console and execute basic commands (status, at the very least) >> from >> >> > other nodes. >> >> >> >> What does ceph -s from those other nodes say? Check versions of all >> >> monitors with >> >> >> >> ceph daemon mon.<id> version >> > >> > >> > So with some suggestions from people on IRC last night, it seems that >> > several nodes didn't get librados upgraded, but still had 0.72. I'm not >> > entirely sure how this happened, but I had to use yum-transaction to >> sort >> > out the fact that python-librados went away for 0.80, and it's quite >> > possible that I made a mistake and didn't upgrade these libraries. After >> > manually getting all of the libraries up to date the problems went away. >> > >> >> >> >> >> >> > >> >> > At this point, I'm reluctant to continue without some advice from >> >> > someone >> >> > else. I can certainly try upgrading the kernal on the rbd client, but >> >> > I'm >> >> > worried I may just make things worse. >> >> >> >> Upgrading the kernel won't make things worse, it's just a client. I'm >> >> pretty sure we can make this work with 3.2, but if you actually plan on >> >> using krbd for anything serious, I'd recommend an upgrade to 3.14. >> >> >> >> 3.13 will do too, if you don't plan on having any erasure pools in your >> >> cluster. >> > >> > >> > I went ahead and upgraded to 3.15 and it sorted out the problems with >> the >> > client. >> >> I hate to tell you this, but due to a subtle change in kernel's low >> level primitives, rbd in 3.15 is prone to deadlocks. It will be fixed >> in future 3.15 stable releases, but a couple people have already run >> into them and they are very reproducible under higher than average >> loads, so you might want to downgrade to 3.14 and do >> >> ceph osd getcrushmap -o /tmp/crush >> crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new >> ceph osd setcrushmap -i /tmp/crush.new >> >> to make "optimal" work with 3.14. >> >> Thanks, >> >> Ilya >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140802/f07c27a1/attachment.htm>