Re: "protocol feature mismatch" after upgrading to Hammer

Kyle Hutson <kylehutson@xxxxxxx> · Thu, 9 Apr 2015 11:32:42 -0500

http://people.beocat.cis.ksu.edu/~kylehutson/crushmap

On Thu, Apr 9, 2015 at 11:25 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
Hmmm. That does look right and neither I nor Sage can come up with

anything via code inspection. Can you post the actual binary crush map

somewhere for download so that we can inspect it with our tools?

-Greg

On Thu, Apr 9, 2015 at 7:57 AM, Kyle Hutson <kylehutson@xxxxxxx> wrote:

> Here 'tis:

> https://dpaste.de/POr1

>

>

> On Thu, Apr 9, 2015 at 9:49 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:

>>

>> Can you dump your crush map and post it on pastebin or something?

>>

>> On Thu, Apr 9, 2015 at 7:26 AM, Kyle Hutson <kylehutson@xxxxxxx> wrote:

>> > Nope - it's 64-bit.

>> >

>> > (Sorry, I missed the reply-all last time.)

>> >

>> > On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:

>> >>

>> >> [Re-added the list]

>> >>

>> >> Hmm, I'm checking the code and that shouldn't be possible. What's your

>> >> ciient? (In particular, is it 32-bit? That's the only thing i can

>> >> think of that might have slipped through our QA.)

>> >>

>> >> On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson <kylehutson@xxxxxxx> wrote:

>> >> > I did nothing to enable anything else. Just changed my ceph repo from

>> >> > 'giant' to 'hammer', then did 'yum update' and restarted services.

>> >> >

>> >> > On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum <greg@xxxxxxxxxxx>

>> >> > wrote:

>> >> >>

>> >> >> Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by

>> >> >> the

>> >> >> cluster unless you made changes to the layout requiring it.

>> >> >>

>> >> >> If you did, the clients have to be upgraded to understand it. You

>> >> >> could disable all the v4 features; that should let them connect

>> >> >> again.

>> >> >> -Greg

>> >> >>

>> >> >> On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson <kylehutson@xxxxxxx>

>> >> >> wrote:

>> >> >> > This particular problem I just figured out myself ('ceph -w' was

>> >> >> > still

>> >> >> > running from before the upgrade, and ctrl-c and restarting solved

>> >> >> > that

>> >> >> > issue), but I'm still having a similar problem on the ceph client:

>> >> >> >

>> >> >> > libceph: mon19 10.5.38.20:6789 feature set mismatch, my

>> >> >> > 2b84a042aca <

>> >> >> > server's 102b84a042aca, missing 1000000000000

>> >> >> >

>> >> >> > It appears that even the latest kernel doesn't have support for

>> >> >> > CEPH_FEATURE_CRUSH_V4

>> >> >> >

>> >> >> > How do I make my ceph cluster backward-compatible with the old

>> >> >> > cephfs

>> >> >> > client?

>> >> >> >

>> >> >> > On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson <kylehutson@xxxxxxx>

>> >> >> > wrote:

>> >> >> >>

>> >> >> >> I upgraded from giant to hammer yesterday and now 'ceph -w' is

>> >> >> >> constantly

>> >> >> >> repeating this message:

>> >> >> >>

>> >> >> >> 2015-04-09 08:50:26.318042 7f95dbf86700  0 -- 10.5.38.1:0/2037478

>> >> >> >> >>

>> >> >> >> 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0

>> >> >> >> l=1

>> >> >> >> c=0x7f95e0023670).connect protocol feature mismatch, my

>> >> >> >> 3fffffffffff

>> >> >> >> <

>> >> >> >> peer

>> >> >> >> 13fffffffffff missing 1000000000000

>> >> >> >>

>> >> >> >> It isn't always the same IP for the destination - here's another:

>> >> >> >> 2015-04-09 08:50:20.322059 7f95dc087700  0 -- 10.5.38.1:0/2037478

>> >> >> >> >>

>> >> >> >> 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0

>> >> >> >> l=1

>> >> >> >> c=0x7f95e002b480).connect protocol feature mismatch, my

>> >> >> >> 3fffffffffff

>> >> >> >> <

>> >> >> >> peer

>> >> >> >> 13fffffffffff missing 1000000000000

>> >> >> >>

>> >> >> >> Some details about our install:

>> >> >> >> We have 24 hosts with 18 OSDs each. 16 per host are spinning

>> >> >> >> disks

>> >> >> >> in

>> >> >> >> an

>> >> >> >> erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions

>> >> >> >> used

>> >> >> >> for a

>> >> >> >> caching tier in front of the EC pool. All 24 hosts are monitors.

>> >> >> >> 4

>> >> >> >> hosts are

>> >> >> >> mds. We are running cephfs with a client trying to write data

>> >> >> >> over

>> >> >> >> cephfs

>> >> >> >> when we're seeing these messages.

>> >> >> >>

>> >> >> >> Any ideas?

>> >> >> >

>> >> >> >

>> >> >> >

>> >> >> > _______________________________________________

>> >> >> > ceph-users mailing list

>> >> >> > ceph-users@xxxxxxxxxxxxxx

>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >> >> >

>> >> >

>> >> >

>> >

>> >

>

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com