Re: Cannot create keys for new 0.78 deployment - protocol mismatch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



... and it was, bit of a foot gun there. Thanks Greg, you were right on the money.

So it looks like my 'make install' puts the libraries in /usr/lib (as expected):

 $ ls -l /usr/lib/librados.so.2.0.0
-rwxr-xr-x 1 root root 92296482 Mar 26 12:57 /usr/lib/librados.so.2.0.0

whereas the Ubuntu packages are placing the corresponding file in /usr/lib/x86_64-linux-gnu:

$ ls -l /usr/lib/x86_64-linux-gnu/librados.so.2.0.0
-rw-r--r-- 1 root root 5679648 Mar 24 22:58 /usr/lib/x86_64-linux-gnu/librados.so.2.0.0

I guess by default the 'ceph' utility is getting libraries from /usr/lib/x86_64-linux-gnu in preference to /usr/lib. For future reference I'll try messing with LD_LIBRARY_PATH and friends next time this comes up! (Actually I have a similar machine at home that still has the 0.72.2 libararies in /usr/lib/x86_64-linux-gnu so I'll check that I've analyzed the cause properly).

Regards

Mark



On 26/03/14 13:26, Mark Kirkwood wrote:
I see I have librbd1 and librados2 at 0.72.2 (due to having qemu installed on this machine). That could be the source of the problem, I'll see if I can update them (I have pending updates I think), and report back.

Cheers

Mark

On 26/03/14 12:23, Mark Kirkwood wrote:
Yeah, it seems possible, however I'm installing 'em all the same way (in particular the 0.77 that works and the 0.77 that does not). The method is:

$ ./autogen.sh
$ ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --with-radosgw
$ time make
$ sudo make install

[probablly not relevant, but for completeness]

$ sudo cp src/init-ceph /etc/init.d/ceph
$ sudo cp src/init-radosgw /etc/init.d/radosgw
$ sudo chmod 755 /etc/init.d/radosgw
$ sudo cp src/upstart/* /etc/init
$ sudo cp udev/* /lib/udev/rules.d/


On 26/03/14 06:26, Gregory Farnum wrote:
So I don't remember exactly the relationships, but /usr/bin/ceph is a
bit of python wrapping some libraries. I think it should be getting
the version number from the right place, but wonder if some of them
aren't being updated appropriately. How are you installing these
binaries?
In particular, that feature bit looks like one of the new ones, so I
wonder if you somehow have an old library floating around somewhere in
there which supports everything except for that one feature.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Mar 24, 2014 at 9:53 PM, Mark Kirkwood
<mark.kirkwood@xxxxxxxxxxxxxxx> wrote:
And more interestingly 0.77-601-ge92d602 from 25 Feb does *not* have this issue - so looks like something in the 0.77 development between 25 Feb and
18 Mar is triggering this.


On 25/03/14 16:52, Mark Kirkwood wrote:
Further - checking with 0.77 from 18th Mar shows the same problem, but 0.73 from 12 Dec 2013 does not have this issue. So anyway, looks like it
is not a 0.78 problem, but is some sort of problem!

Regards

Mark

On 25/03/14 15:09, Mark Kirkwood wrote:
Fresh clone and rebuild results in the same error.

$ /usr/bin/ceph --version
ceph version 0.78-325-ge5a4f5e (e5a4f5ed005c9349be94b19ef33d6fe08271c798)

/usr/bin/ceph-mon --version
ceph version 0.78-325-ge5a4f5e (e5a4f5ed005c9349be94b19ef33d6fe08271c798)

On 25/03/14 14:16, Mark Kirkwood wrote:
Yeah, that is my feeling too - however both ceph and ceph-mon claim to be the same version...and the dates on the various binaries are all today.
But I'll rebuild anyway to double check!

Thanks

Mark

On 25/03/14 13:57, Gregory Farnum wrote:
That is pretty strange, but I think probably there's somehow a mismatch between the installed versions. Can you check with the --version flag
on
both binaries?

On Monday, March 24, 2014, Mark Kirkwood
<mark.kirkwood@xxxxxxxxxxxxxxx>
wrote:

Hi,

I'm redeploying my development cluster after building 0.78 from src on
Ubuntu 14.04. Ceph version is ceph version 0.78-325-ge5a4f5e.

So proceeding as usual:

$ ceph-deploy new vedavec
$ ceph-deploy mon create vedavec

The monitor comes up:

[vedavec][DEBUG ] ******************************
**************************************************
[vedavec][DEBUG ] status for monitor: mon.vedavec
[vedavec][DEBUG ] {
[vedavec][DEBUG ]   "election_epoch": 2,
[vedavec][DEBUG ]   "extra_probe_peers": [],
[vedavec][DEBUG ]   "monmap": {
[vedavec][DEBUG ]     "created": "0.000000",
[vedavec][DEBUG ]     "epoch": 1,
[vedavec][DEBUG ] "fsid": "aa31c65c-bc94-4e19-940e-2e124c52ed2e",
[vedavec][DEBUG ]     "modified": "0.000000",
[vedavec][DEBUG ]     "mons": [
[vedavec][DEBUG ]       {
[vedavec][DEBUG ]         "addr": "192.168.2.63:6789/0",
[vedavec][DEBUG ]         "name": "vedavec",
[vedavec][DEBUG ]         "rank": 0
[vedavec][DEBUG ]       }
[vedavec][DEBUG ]     ]
[vedavec][DEBUG ]   },
[vedavec][DEBUG ]   "name": "vedavec",
[vedavec][DEBUG ]   "outside_quorum": [],
[vedavec][DEBUG ]   "quorum": [
[vedavec][DEBUG ]     0
[vedavec][DEBUG ]   ],
[vedavec][DEBUG ]   "rank": 0,
[vedavec][DEBUG ]   "state": "leader",
[vedavec][DEBUG ]   "sync_provider": []
[vedavec][DEBUG ] }
[vedavec][DEBUG ] ******************************
**************************************************
[vedavec][INFO  ] monitor: mon.vedavec is running
[vedavec][INFO  ] Running command: sudo ceph --cluster=ceph
--admin-daemon
/var/run/ceph/ceph-mon.vedavec.asok mon_status


but I see the create keys is hanging:
root      7820  0.1  0.0 138596 12372 ?        Ssl 13:10 0:00
/usr/bin/ceph-mon --cluster=ceph -i vedavec -f
root 7822 0.0 0.0 34040 7508 ? Ss 13:10 0:00 python
/usr/sbin/ceph-create-keys --cluster=ceph -i vedavec
root 7839 0.2 0.0 426604 16264 ? Sl 13:10 0:00 python
/usr/bin/ceph --cluster=ceph --name=mon.
--keyring=/var/lib/ceph/mon/ceph-vedavec/keyring
auth get-or-create client.admin mon allow * osd allow * mds allow

In fact its the auth-get or create process that's the issue:

$ sudo ceph --name=mon.
--keyring=/var/lib/ceph/mon/ceph-vedavec/keyring
auth get
2014-03-25 13:22:31.591282 7fc048840700 0 -- 192.168.2.63:0/1010637 192.168.2.63:6789/0 pipe(0x7fc04c0202d0 sd=3 :43428 s=1 pgs=0 cs=0 l=1 c=0x7fc04c020530).connect protocol feature mismatch, my fffffffff <
peer
1fffffffff missing 1000000000


So some sort of feature mismatch seems to be happening between ceph
client
and ceph-mon - that seems a bit strange (they ate both from the 0.78 checkout). I see other instances of this type of error are usually to
do
with kernel (rbd etc) clients lacking the new features, but I am a bit
puzzled about this case here - any thoughts?

Regards

Mark
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux