Re: 10.2.3: Howto disable cephx_sign_messages and preventing a LogFlood

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 14, 2016 at 5:10 PM, Bjoern Laessig
<b.laessig@xxxxxxxxxxxxxx> wrote:
> Hi,
>
> i triggered a Kernel bug in the ceph-krbd code
>  * http://www.spinics.net/lists/ceph-devel/msg33802.html

The fix is ready and is set to be merged into 4.10-rc1.

How often can you hit it?

>
> Ilya Dryomov wrote in a reply: The way to go is to \u201cdisabling cephx
> message signing\u201d

That's one option.  Lightening the load on the client machine is
another - this bug should only surface under high memory pressure.

>
> I have a lot of systems that use librbd and only 2 Hosts that user krbd.
> I do not want to disable message-signing completely, so after trying to:
>
>   rbd map mypool/myrbd --id dev --keyring /etc/ceph/ceph.client.dev.keyring --options nocephx_sign_messages
>
> which lead to endless waiting. Strace says:
>
> [pid 10845] open("/sys/bus/rbd/add_single_major", O_WRONLY) = 4
> [pid 10845] write(4, "[2001:67c:670:100:5054:ff:fe78:2a86]:6789,[2001:67c:670:100:5054:ff:fecf:22f6]:6789,[2001:67c:670:100:5054:ff:feae:42ee]:6789 name=dev,key=client.dev,nocephx_sign_messages ptxdev WORK_CEPH_BLA -", 194 <unfinished ...>
> [pid 10899] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
> [pid 10899] futex(0x55f663895508, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid 10899] futex(0x55f66389555c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 3, {1481729207, 622660011}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
> [pid 10899] futex(0x55f663895508, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid 10899] futex(0x55f66389555c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 5, {1481729212, 622818887}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
> [pid 10899] futex(0x55f663895508, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid 10899] futex(0x55f66389555c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 7, {1481729217, 622973709}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
> [pid 10899] futex(0x55f663895508, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid 10899] futex(0x55f66389555c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 9, {1481729222, 623078200}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
> [pid 10899] futex(0x55f663895508, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid 10899] futex(0x55f66389555c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 11, {1481729227, 623190181}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
> [pid 10899] futex(0x55f663895508, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid 10899] futex(0x55f66389555c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 13, {1481729232, 623298560}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
> [pid 10899] futex(0x55f663895508, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid 10899] futex(0x55f66389555c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 15, {1481729237, 623504923}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
> [pid 10899] futex(0x55f663895508, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid 10899] futex(0x55f66389555c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 17, {1481729242, 623709707}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
> [pid 10899] futex(0x55f663895508, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid 10899] futex(0x55f66389555c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 19, {1481729247, 623816985}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
> [pid 10899] futex(0x55f663895508, FUTEX_WAKE_PRIVATE, 1) = 0
> (... cycle endless )
>
> After failing with rbd-map-options, i set in /etc/ceph/ceph.conf:
>> [global]
>> cephx_sign_messages = false
>
> Now i have 1 (one) ceph-osd-process which starts logging an enormous
> (4GBytes/hour) count of lines like:
>
>
>> 2016-12-14 07:49:40.111415 7f25f0c0a700  0 SIGN: MSG 1 Sender did not set CEPH_MSG_FOOTER_SIGNED.
>> 2016-12-14 07:49:40.111422 7f25f0c0a700  0 SIGN: MSG 1 Message signature does not match contents.
>> 2016-12-14 07:49:40.111423 7f25f0c0a700  0 SIGN: MSG 1Signature on message:
>> 2016-12-14 07:49:40.111425 7f25f0c0a700  0 SIGN: MSG 1    sig: 0
>> 2016-12-14 07:49:40.111427 7f25f0c0a700  0 SIGN: MSG 1Locally calculated signature:
>> 2016-12-14 07:49:40.111428 7f25f0c0a700  0 SIGN: MSG 1    sig_check:1990181984681779795
>> 2016-12-14 07:49:40.111429 7f25f0c0a700  0 Signature failed.
>> 2016-12-14 07:49:40.111430 7f25f0c0a700  0 -- [myOsdNode1.ipv6.address]:6802/29624 >> [my_kRBD_client.ipv6.address]:0/943150662 pipe(0x5603813bd400 sd=121 :6802 s=2 pgs=353544735 cs=1 l=1 c=0x560381182900).Signature check failed
>
> Then i read on:
>   http://docs.ceph.com/docs/jewel/rados/troubleshooting/log-and-debug/
> that \u201cCeph\u2019s logging levels operate on a scale of 1 to 20, where 1 is
> terse and 20 is verbose.\u201d The Messages i got, had a loglevel of 0 so i
> set the loglevel to -1.
>
>> ceph tell osd.3 injectargs --debug-auth -1
>> ceph tell osd.3 injectargs --debug-ms -1
>
> to get rid of them.
>
> Actually i do not have to delete the logfiles every 12 hours, so my pain
> has gone but its a workaround for a workaround. That is painful. What
> could i do to disable cephx-message-signing only for the krbd clients?

I don't think you can enable/disable message signing on a per
connection basis - once the feature bit is negotiated, messengers on
both sides expect everything to be signed.  Feature bits are static and
the MSG_AUTH feature bit is enabled since bobtail and kernel 3.19.

It has to be disabled both on the server side (via ceph.conf, all
daemons need to be restarted) and on the client side (via rbd map -o
nocephx_sign_messages).

Suppressing logging is obviously the wrong thing to do here ;)

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux