Hello, I have enabled debugging on my MONs and OSDs to help troubleshoot these signature check failures. I was watching ods.4's log and saw these errors when the signature check failure happened. 2018-02-15 18:06:29.235791 7f8bca7de700 1 -- 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021 conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).read_bulk peer close file descriptor 81 2018-02-15 18:06:29.235832 7f8bca7de700 1 -- 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021 conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).read_until read failed 2018-02-15 18:06:29.235841 7f8bca7de700 1 -- 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021 conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).process read tag failed 2018-02-15 18:06:29.235848 7f8bca7de700 1 -- 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021 conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).fault on lossy channel, failing 2018-02-15 18:06:29.235966 7f8bc0853700 2 osd.8 27498 ms_handle_reset con 0x55f802746000 session 0x55f8063b3180 Could someone please look at this? We have 3 different Ceph clusters setup and they all have this issue. This cluster is running Gentoo and Ceph version 12.2.2-r1. The other two clusters are 12.2.2. Exporting images causes signature check failures and with larger files it seg faults as well. When exporting the image from osd.4 This message shows up as well. Exporting image: 1% complete...2018-02-15 18:14:05.283708 7f6834277700 0 -- 192.168.173.44:0/122241099 >> 192.168.173.44:6801/72152 conn(0x7f681400ff10 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=1).handle_connect_reply connect got BADAUTHORIZER The error below show up on all OSD/MGR/MON nodes when exporting an image. Exporting image: 8% complete...2018-02-15 18:15:51.419437 7f2b64ac0700 0 SIGN: MSG 28 Message signature does not match contents. 2018-02-15 18:15:51.419459 7f2b64ac0700 0 SIGN: MSG 28Signature on message: 2018-02-15 18:15:51.419460 7f2b64ac0700 0 SIGN: MSG 28 sig: 8338581684421737157 2018-02-15 18:15:51.419469 7f2b64ac0700 0 SIGN: MSG 28Locally calculated signature: 2018-02-15 18:15:51.419470 7f2b64ac0700 0 SIGN: MSG 28 sig_check:5913182128308244 2018-02-15 18:15:51.419471 7f2b64ac0700 0 Signature failed. 2018-02-15 18:15:51.419472 7f2b64ac0700 0 -- 192.168.173.44:0/3919097436 >> 192.168.173.44:6801/72152 conn(0x7f2b4800ff10 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=39 cs=1 l=1).process Signature check failed Our VMs crash when writing to disk. Libvirt's logs just say the VM crashed. This is a blocker. Has anyone else seen this? This seems to be an issue with Ceph Luminous, as we were not having these problem with Jewel. Cary -Dynamic On Thu, Feb 1, 2018 at 7:04 PM, Cary <dynamic.cary@xxxxxxxxx> wrote: > Hello, > > I did not do anything special that I know of. I was just exporting an > image from Openstack. We have recently upgraded from Jewel 10.2.3 to > Luminous 12.2.2. > > Caps for admin: > client.admin > key: CENSORED > auid: 0 > caps: [mgr] allow * > caps: [mon] allow * > caps: [osd] allow * > > Caps for Cinder: > client.cinder > key: CENSORED > caps: [mgr] allow r > caps: [mon] profile rbd, allow command "osd blacklist" > caps: [osd] profile rbd pool=vms, profile rbd pool=volumes, > profile rbd pool=images > > Caps for MGR: > mgr.0 > key: CENSORED > caps: [mon] allow * > > I believe this is causing the virtual machines we have running to > crash. Any advice would be appreciated. Please let me know if I need > to provide any other details. Thank you, > > Cary > -Dynamic > > On Mon, Jan 29, 2018 at 7:53 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >> On Fri, Jan 26, 2018 at 12:14 PM Cary <dynamic.cary@xxxxxxxxx> wrote: >>> >>> Hello, >>> >>> We are running Luminous 12.2.2. 6 OSD hosts with 12 1TB OSDs, and 64GB >>> RAM. Each host has a SSD for Bluestore's block.wal and block.db. >>> There are 5 monitor nodes as well with 32GB RAM. All servers have >>> Gentoo with kernel, 4.12.12-gentoo. >>> >>> When I export an image using: >>> rbd export pool-name/volume-name /location/image-name.raw >>> >>> Message similar to below are displayed. The signature check fails >>> randomly. And sometimes a message about a bad authorizer, but not >>> everytime. >>> The image is still exported successfully. >>> >>> 2018-01-24 17:35:15.616080 7fc8d4024700 0 cephx: >>> verify_authorizer_reply bad nonce got 4552544084014661633 expected >>> 4552499520046621785 sent 4552499520046621784 >>> 2018-01-24 17:35:15.616098 7fc8d4024700 0 -- >>> 172.21.32.16:0/1412094654 >> 172.21.32.6:6802/6219 conn(0x7fc8b0078a50 >>> :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 >>> l=1)._process_connection failed verifying authorize reply >>> 2018-01-24 17:35:15.699004 7fc8d4024700 0 SIGN: MSG 2 Message >>> signature does not match contents. >>> 2018-01-24 17:35:15.699020 7fc8d4024700 0 SIGN: MSG 2Signature on >>> message: >>> 2018-01-24 17:35:15.699021 7fc8d4024700 0 SIGN: MSG 2 sig: >>> 8189090775647585001 >>> 2018-01-24 17:35:15.699047 7fc8d4024700 0 SIGN: MSG 2Locally >>> calculated signature: >>> 2018-01-24 17:35:15.699048 7fc8d4024700 0 SIGN: MSG 2 >>> sig_check:140500325643792 >>> 2018-01-24 17:35:15.699049 7fc8d4024700 0 Signature failed. >>> 2018-01-24 17:35:15.699050 7fc8d4024700 0 -- >>> 172.21.32.16:0/1412094654 >> 172.21.32.2:6807/153106 >>> conn(0x7fc8bc020870 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH >>> pgs=26018 cs=1 l=1).process Signature check failed >>> >>> Does anyone know what could cause this, and what I can do to fix it. >> >> >> That's in the cephx authentication code and it's indicating that the secure >> signature sent with the message isn't what the local node thinks it should >> be. That's pretty odd (a bit flip or something that could actually change it >> ought to trigger the messaging checksums directly) and I'm not quite sure >> how it could happen. >> >> But, as you've noticed, it retries and apparently succeeds. How did you >> notice this? >> -Greg >> >>> >>> >>> Thank you, >>> >>> Cary >>> -Dynamic >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com