Gregory, I greatly appreciate your assistance. I recompiled Ceph with -ssl and the nss USE flags set, which is opposite what I was using. I am now able to export from our pools without signature check failures. Thank you for pointing me in the right direction. Cary -Dynamic On Fri, Feb 16, 2018 at 11:29 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > On Thu, Feb 15, 2018 at 10:28 AM Cary <dynamic.cary@xxxxxxxxx> wrote: >> >> Hello, >> >> I have enabled debugging on my MONs and OSDs to help troubleshoot >> these signature check failures. I was watching ods.4's log and saw >> these errors when the signature check failure happened. >> >> 2018-02-15 18:06:29.235791 7f8bca7de700 1 -- >> 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021 >> conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).read_bulk peer >> close file descriptor 81 >> 2018-02-15 18:06:29.235832 7f8bca7de700 1 -- >> 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021 >> conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).read_until read >> failed >> 2018-02-15 18:06:29.235841 7f8bca7de700 1 -- >> 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021 >> conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).process read >> tag failed >> 2018-02-15 18:06:29.235848 7f8bca7de700 1 -- >> 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021 >> conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).fault on lossy >> channel, failing >> 2018-02-15 18:06:29.235966 7f8bc0853700 2 osd.8 27498 ms_handle_reset >> con 0x55f802746000 session 0x55f8063b3180 >> >> >> Could someone please look at this? We have 3 different Ceph clusters >> setup and they all have this issue. This cluster is running Gentoo and >> Ceph version 12.2.2-r1. The other two clusters are 12.2.2. Exporting >> images causes signature check failures and with larger files it seg >> faults as well. >> >> When exporting the image from osd.4 This message shows up as well. >> Exporting image: 1% complete...2018-02-15 18:14:05.283708 7f6834277700 >> 0 -- 192.168.173.44:0/122241099 >> 192.168.173.44:6801/72152 >> conn(0x7f681400ff10 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH >> pgs=0 cs=0 l=1).handle_connect_reply connect got BADAUTHORIZER >> >> The error below show up on all OSD/MGR/MON nodes when exporting an image. >> Exporting image: 8% complete...2018-02-15 18:15:51.419437 7f2b64ac0700 >> 0 SIGN: MSG 28 Message signature does not match contents. >> 2018-02-15 18:15:51.419459 7f2b64ac0700 0 SIGN: MSG 28Signature on >> message: >> 2018-02-15 18:15:51.419460 7f2b64ac0700 0 SIGN: MSG 28 sig: >> 8338581684421737157 >> 2018-02-15 18:15:51.419469 7f2b64ac0700 0 SIGN: MSG 28Locally >> calculated signature: >> 2018-02-15 18:15:51.419470 7f2b64ac0700 0 SIGN: MSG 28 >> sig_check:5913182128308244 >> 2018-02-15 18:15:51.419471 7f2b64ac0700 0 Signature failed. >> 2018-02-15 18:15:51.419472 7f2b64ac0700 0 -- >> 192.168.173.44:0/3919097436 >> 192.168.173.44:6801/72152 >> conn(0x7f2b4800ff10 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH >> pgs=39 cs=1 l=1).process Signature check failed >> >> Our VMs crash when writing to disk. Libvirt's logs just say the VM >> crashed. This is a blocker. Has anyone else seen this? This seems to >> be an issue with Ceph Luminous, as we were not having these problem >> with Jewel. > > > When I search through my email, the only two reports of failed signatures > are people who in fact had misconfiguration issues resulting in one end > using signatures and the other side not. > > Given that, and since you're on Gentoo and presumably compiled the packages > yourself, the most likely explanation I can think of is something that went > wrong between your packages and the compilation. :/ > > I guess you could try switching from libnss to libcryptopp (or vice versa) > by recompiling with the relevant makeflags if you want to do something that > only involves the Ceph code. Otherwise, do a rebuild? > > Sadly I don't think there's much else we can suggest given that nobody has > seen this with binary packages blessed by the upstream or a distribution. > -Greg > >> >> >> Cary >> -Dynamic >> >> On Thu, Feb 1, 2018 at 7:04 PM, Cary <dynamic.cary@xxxxxxxxx> wrote: >> > Hello, >> > >> > I did not do anything special that I know of. I was just exporting an >> > image from Openstack. We have recently upgraded from Jewel 10.2.3 to >> > Luminous 12.2.2. >> > >> > Caps for admin: >> > client.admin >> > key: CENSORED >> > auid: 0 >> > caps: [mgr] allow * >> > caps: [mon] allow * >> > caps: [osd] allow * >> > >> > Caps for Cinder: >> > client.cinder >> > key: CENSORED >> > caps: [mgr] allow r >> > caps: [mon] profile rbd, allow command "osd blacklist" >> > caps: [osd] profile rbd pool=vms, profile rbd pool=volumes, >> > profile rbd pool=images >> > >> > Caps for MGR: >> > mgr.0 >> > key: CENSORED >> > caps: [mon] allow * >> > >> > I believe this is causing the virtual machines we have running to >> > crash. Any advice would be appreciated. Please let me know if I need >> > to provide any other details. Thank you, >> > >> > Cary >> > -Dynamic >> > >> > On Mon, Jan 29, 2018 at 7:53 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> >> > wrote: >> >> On Fri, Jan 26, 2018 at 12:14 PM Cary <dynamic.cary@xxxxxxxxx> wrote: >> >>> >> >>> Hello, >> >>> >> >>> We are running Luminous 12.2.2. 6 OSD hosts with 12 1TB OSDs, and >> >>> 64GB >> >>> RAM. Each host has a SSD for Bluestore's block.wal and block.db. >> >>> There are 5 monitor nodes as well with 32GB RAM. All servers have >> >>> Gentoo with kernel, 4.12.12-gentoo. >> >>> >> >>> When I export an image using: >> >>> rbd export pool-name/volume-name /location/image-name.raw >> >>> >> >>> Message similar to below are displayed. The signature check fails >> >>> randomly. And sometimes a message about a bad authorizer, but not >> >>> everytime. >> >>> The image is still exported successfully. >> >>> >> >>> 2018-01-24 17:35:15.616080 7fc8d4024700 0 cephx: >> >>> verify_authorizer_reply bad nonce got 4552544084014661633 expected >> >>> 4552499520046621785 sent 4552499520046621784 >> >>> 2018-01-24 17:35:15.616098 7fc8d4024700 0 -- >> >>> 172.21.32.16:0/1412094654 >> 172.21.32.6:6802/6219 conn(0x7fc8b0078a50 >> >>> :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 >> >>> l=1)._process_connection failed verifying authorize reply >> >>> 2018-01-24 17:35:15.699004 7fc8d4024700 0 SIGN: MSG 2 Message >> >>> signature does not match contents. >> >>> 2018-01-24 17:35:15.699020 7fc8d4024700 0 SIGN: MSG 2Signature on >> >>> message: >> >>> 2018-01-24 17:35:15.699021 7fc8d4024700 0 SIGN: MSG 2 sig: >> >>> 8189090775647585001 >> >>> 2018-01-24 17:35:15.699047 7fc8d4024700 0 SIGN: MSG 2Locally >> >>> calculated signature: >> >>> 2018-01-24 17:35:15.699048 7fc8d4024700 0 SIGN: MSG 2 >> >>> sig_check:140500325643792 >> >>> 2018-01-24 17:35:15.699049 7fc8d4024700 0 Signature failed. >> >>> 2018-01-24 17:35:15.699050 7fc8d4024700 0 -- >> >>> 172.21.32.16:0/1412094654 >> 172.21.32.2:6807/153106 >> >>> conn(0x7fc8bc020870 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH >> >>> pgs=26018 cs=1 l=1).process Signature check failed >> >>> >> >>> Does anyone know what could cause this, and what I can do to fix it. >> >> >> >> >> >> That's in the cephx authentication code and it's indicating that the >> >> secure >> >> signature sent with the message isn't what the local node thinks it >> >> should >> >> be. That's pretty odd (a bit flip or something that could actually >> >> change it >> >> ought to trigger the messaging checksums directly) and I'm not quite >> >> sure >> >> how it could happen. >> >> >> >> But, as you've noticed, it retries and apparently succeeds. How did you >> >> notice this? >> >> -Greg >> >> >> >>> >> >>> >> >>> Thank you, >> >>> >> >>> Cary >> >>> -Dynamic >> >>> _______________________________________________ >> >>> ceph-users mailing list >> >>> ceph-users@xxxxxxxxxxxxxx >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com