Re: Signature check failures.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Gregory,


I greatly appreciate your assistance. I recompiled Ceph with -ssl and
the nss USE flags set, which is opposite what I was using. I am now
able to export from our pools without signature check failures. Thank
you for pointing me in the right direction.

Cary
-Dynamic



On Fri, Feb 16, 2018 at 11:29 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> On Thu, Feb 15, 2018 at 10:28 AM Cary <dynamic.cary@xxxxxxxxx> wrote:
>>
>> Hello,
>>
>> I have enabled debugging on my MONs and OSDs to help troubleshoot
>> these signature check failures. I was watching ods.4's log and saw
>> these errors when the signature check failure happened.
>>
>> 2018-02-15 18:06:29.235791 7f8bca7de700  1 --
>> 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021
>> conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).read_bulk peer
>> close file descriptor 81
>> 2018-02-15 18:06:29.235832 7f8bca7de700  1 --
>> 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021
>> conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).read_until read
>> failed
>> 2018-02-15 18:06:29.235841 7f8bca7de700  1 --
>> 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021
>> conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).process read
>> tag failed
>> 2018-02-15 18:06:29.235848 7f8bca7de700  1 --
>> 192.168.173.44:6806/72264 >> 192.168.173.42:0/4264467021
>> conn(0x55f802746000 :6806 s=STATE_OPEN pgs=7 cs=1 l=1).fault on lossy
>> channel, failing
>> 2018-02-15 18:06:29.235966 7f8bc0853700  2 osd.8 27498 ms_handle_reset
>> con 0x55f802746000 session 0x55f8063b3180
>>
>>
>>  Could someone please look at this? We have 3 different Ceph clusters
>> setup and they all have this issue. This cluster is running Gentoo and
>> Ceph version 12.2.2-r1. The other two clusters are 12.2.2. Exporting
>> images causes signature check failures and with larger files it seg
>> faults as well.
>>
>> When exporting the image from osd.4 This message shows up as well.
>> Exporting image: 1% complete...2018-02-15 18:14:05.283708 7f6834277700
>>  0 -- 192.168.173.44:0/122241099 >> 192.168.173.44:6801/72152
>> conn(0x7f681400ff10 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH
>> pgs=0 cs=0 l=1).handle_connect_reply connect got BADAUTHORIZER
>>
>> The error below show up on all OSD/MGR/MON nodes when exporting an image.
>> Exporting image: 8% complete...2018-02-15 18:15:51.419437 7f2b64ac0700
>>  0 SIGN: MSG 28 Message signature does not match contents.
>> 2018-02-15 18:15:51.419459 7f2b64ac0700  0 SIGN: MSG 28Signature on
>> message:
>> 2018-02-15 18:15:51.419460 7f2b64ac0700  0 SIGN: MSG 28    sig:
>> 8338581684421737157
>> 2018-02-15 18:15:51.419469 7f2b64ac0700  0 SIGN: MSG 28Locally
>> calculated signature:
>> 2018-02-15 18:15:51.419470 7f2b64ac0700  0 SIGN: MSG 28
>> sig_check:5913182128308244
>> 2018-02-15 18:15:51.419471 7f2b64ac0700  0 Signature failed.
>> 2018-02-15 18:15:51.419472 7f2b64ac0700  0 --
>> 192.168.173.44:0/3919097436 >> 192.168.173.44:6801/72152
>> conn(0x7f2b4800ff10 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH
>> pgs=39 cs=1 l=1).process Signature check failed
>>
>> Our VMs crash when writing to disk. Libvirt's logs just say the VM
>> crashed.   This is a blocker. Has anyone else seen this? This seems to
>> be an issue with Ceph Luminous, as we were not having these problem
>> with Jewel.
>
>
> When I search through my email, the only two reports of failed signatures
> are people who in fact had misconfiguration issues resulting in one end
> using signatures and the other side not.
>
> Given that, and since you're on Gentoo and presumably compiled the packages
> yourself, the most likely explanation I can think of is something that went
> wrong between your packages and the compilation. :/
>
> I guess you could try switching from libnss to libcryptopp (or vice versa)
> by recompiling with the relevant makeflags if you want to do something that
> only involves the Ceph code. Otherwise, do a rebuild?
>
> Sadly I don't think there's much else we can suggest given that nobody has
> seen this with binary packages blessed by the upstream or a distribution.
> -Greg
>
>>
>>
>> Cary
>> -Dynamic
>>
>> On Thu, Feb 1, 2018 at 7:04 PM, Cary <dynamic.cary@xxxxxxxxx> wrote:
>> > Hello,
>> >
>> > I did not do anything special that I know of. I was just exporting an
>> > image from Openstack. We have recently upgraded from Jewel 10.2.3 to
>> > Luminous 12.2.2.
>> >
>> > Caps for admin:
>> > client.admin
>> >         key: CENSORED
>> >         auid: 0
>> >         caps: [mgr] allow *
>> >         caps: [mon] allow *
>> >         caps: [osd] allow *
>> >
>> > Caps for Cinder:
>> > client.cinder
>> >         key: CENSORED
>> >         caps: [mgr] allow r
>> >         caps: [mon] profile rbd, allow command "osd blacklist"
>> >         caps: [osd] profile rbd pool=vms, profile rbd pool=volumes,
>> > profile rbd pool=images
>> >
>> > Caps for MGR:
>> > mgr.0
>> >         key: CENSORED
>> >         caps: [mon] allow *
>> >
>> > I believe this is causing the virtual machines we have running to
>> > crash. Any advice would be appreciated. Please let me know if I need
>> > to provide any other details. Thank you,
>> >
>> > Cary
>> > -Dynamic
>> >
>> > On Mon, Jan 29, 2018 at 7:53 PM, Gregory Farnum <gfarnum@xxxxxxxxxx>
>> > wrote:
>> >> On Fri, Jan 26, 2018 at 12:14 PM Cary <dynamic.cary@xxxxxxxxx> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>>  We are running Luminous 12.2.2. 6 OSD hosts with 12 1TB OSDs, and
>> >>> 64GB
>> >>> RAM. Each host has a SSD for Bluestore's block.wal and block.db.
>> >>> There are 5 monitor nodes as well with 32GB RAM. All servers have
>> >>> Gentoo with kernel, 4.12.12-gentoo.
>> >>>
>> >>> When I export an image using:
>> >>> rbd export pool-name/volume-name  /location/image-name.raw
>> >>>
>> >>> Message similar to below are displayed. The signature check fails
>> >>> randomly. And sometimes a message about a bad authorizer, but not
>> >>> everytime.
>> >>> The image is still exported successfully.
>> >>>
>> >>> 2018-01-24 17:35:15.616080 7fc8d4024700  0 cephx:
>> >>> verify_authorizer_reply bad nonce got 4552544084014661633 expected
>> >>> 4552499520046621785 sent 4552499520046621784
>> >>> 2018-01-24 17:35:15.616098 7fc8d4024700  0 --
>> >>> 172.21.32.16:0/1412094654 >> 172.21.32.6:6802/6219 conn(0x7fc8b0078a50
>> >>> :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0
>> >>> l=1)._process_connection failed verifying authorize reply
>> >>> 2018-01-24 17:35:15.699004 7fc8d4024700  0 SIGN: MSG 2 Message
>> >>> signature does not match contents.
>> >>> 2018-01-24 17:35:15.699020 7fc8d4024700  0 SIGN: MSG 2Signature on
>> >>> message:
>> >>> 2018-01-24 17:35:15.699021 7fc8d4024700  0 SIGN: MSG 2    sig:
>> >>> 8189090775647585001
>> >>> 2018-01-24 17:35:15.699047 7fc8d4024700  0 SIGN: MSG 2Locally
>> >>> calculated signature:
>> >>> 2018-01-24 17:35:15.699048 7fc8d4024700  0 SIGN: MSG 2
>> >>> sig_check:140500325643792
>> >>> 2018-01-24 17:35:15.699049 7fc8d4024700  0 Signature failed.
>> >>> 2018-01-24 17:35:15.699050 7fc8d4024700  0 --
>> >>> 172.21.32.16:0/1412094654 >> 172.21.32.2:6807/153106
>> >>> conn(0x7fc8bc020870 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH
>> >>> pgs=26018 cs=1 l=1).process Signature check failed
>> >>>
>> >>> Does anyone know what could cause this, and what I can do to fix it.
>> >>
>> >>
>> >> That's in the cephx authentication code and it's indicating that the
>> >> secure
>> >> signature sent with the message isn't what the local node thinks it
>> >> should
>> >> be. That's pretty odd (a bit flip or something that could actually
>> >> change it
>> >> ought to trigger the messaging checksums directly) and I'm not quite
>> >> sure
>> >> how it could happen.
>> >>
>> >> But, as you've noticed, it retries and apparently succeeds. How did you
>> >> notice this?
>> >> -Greg
>> >>
>> >>>
>> >>>
>> >>> Thank you,
>> >>>
>> >>> Cary
>> >>> -Dynamic
>> >>> _______________________________________________
>> >>> ceph-users mailing list
>> >>> ceph-users@xxxxxxxxxxxxxx
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux