SOLVED: upgrading to Luminous v12.1.2 put a stop to the OSD crashes in cephx_verify_authorizer().
On Fri, Jul 21, 2017 at 3:21 AM Jens Harbott <j.harbott@xxxxxxxx> wrote:
2017-07-21 1:14 GMT+00:00 Gregory Farnum <gfarnum@xxxxxxxxxx>:
> At a glance that looks like the bug fixed by just-merged
> https://github.com/ceph/ceph/pull/16421
With the crashes in cephx_verify_authorizer() this rather looks like
an instance of http://tracker.ceph.com/issues/20667 to me with
https://github.com/ceph/ceph/pull/16455 as proposed fix. See Sage's
mail on ceph-dev earlier.
> On Thu, Jul 20, 2017 at 1:02 PM Roger Brown <rogerpbrown@xxxxxxxxx> wrote:
...
>> Representative example from osd1 logs:
>> Jul 20 13:42:18 osd1 ceph-osd[4035]: *** Caught signal (Segmentation
>> fault) **
>> Jul 20 13:42:18 osd1 ceph-osd[4035]: in thread 7f52960e7700
>> thread_name:msgr-worker-2
>> Jul 20 13:42:18 osd1 ceph-osd[4035]: 2017-07-20 13:42:18.658076
>> 7f529bf85c80 -1 osd.3 3444 log_to_monitors {default=true}
>> Jul 20 13:42:18 osd1 ceph-osd[4035]: 2017-07-20 13:42:18.662695
>> 7f52968e8700 -1 failed to decode message of type 70 v3:
>> buffer::malformed_input: void
>> osd_peer_stat_t::decode(ceph::buffer::list::iterator&) no longer understand
>> old encoding version 1 < struct_compat
>> Jul 20 13:42:18 osd1 ceph-osd[4035]: ceph version 12.1.1
>> (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)
>> Jul 20 13:42:18 osd1 ceph-osd[4035]: 1: (()+0xa257a4) [0x55bc98fe27a4]
>> Jul 20 13:42:18 osd1 ceph-osd[4035]: 2: (()+0x11390) [0x7f529a468390]
>> Jul 20 13:42:18 osd1 ceph-osd[4035]: 3:
>> (cephx_verify_authorizer(CephContext*, KeyStore*,
>> ceph::buffer::list::iterator&, CephXServiceTicketInfo&,
>> ceph::buffer::list&)+0x496) [0x55bc991b0ca6]
...
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com