Re: ceph-mgr SIGABRTs on startup after cluster upgrade from Kraken to Luminous

Brad Hubbard <bhubbard@xxxxxxxxxx> · Tue, 12 Sep 2017 11:15:19 +1000

Looks like there is a tracker opened for this.

http://tracker.ceph.com/issues/21197

Please add your details there.

On Tue, Sep 12, 2017 at 11:04 AM, Katie Holly <holly@xxxxxxxxx> wrote:
> Hi,
>
> I recently upgraded one of our clusters from Kraken to Luminous (the cluster was initialized with Jewel) on Ubuntu 16.04 and deployed ceph-mgr on all of our ceph-mon nodes with ceph-deploy.
>
> Related log entries after initial deployment of ceph-mgr:
>
> 2017-09-11 06:41:53.535025 7fb5aa7b8500  0 set uid:gid to 64045:64045 (ceph:ceph)
> 2017-09-11 06:41:53.535048 7fb5aa7b8500  0 ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown), pid 17031
> 2017-09-11 06:41:53.536853 7fb5aa7b8500  0 pidfile_write: ignore empty --pid-file
> 2017-09-11 06:41:53.541880 7fb5aa7b8500  1 mgr send_beacon standby
> 2017-09-11 06:41:54.547383 7fb5a1aec700  1 mgr handle_mgr_map Activating!
> 2017-09-11 06:41:54.547575 7fb5a1aec700  1 mgr handle_mgr_map I am now activating
> 2017-09-11 06:41:54.650677 7fb59dae4700  1 mgr start Creating threads for 0 modules
> 2017-09-11 06:41:54.650696 7fb59dae4700  1 mgr send_beacon active
> 2017-09-11 06:41:55.542252 7fb59eae6700  1 mgr send_beacon active
> 2017-09-11 06:41:55.542627 7fb59eae6700  1 mgr.server send_report Not sending PG status to monitor yet, waiting for OSDs
> 2017-09-11 06:41:57.542697 7fb59eae6700  1 mgr send_beacon active
> [... lots of "send_beacon active" messages ...]
> 2017-09-11 07:29:29.640892 7fb59eae6700  1 mgr send_beacon active
> 2017-09-11 07:29:30.866366 7fb59d2e3700 -1 *** Caught signal (Aborted) **
>  in thread 7fb59d2e3700 thread_name:ms_dispatch
>
>  ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
>  1: (()+0x3de6b4) [0x55f6640e16b4]
>  2: (()+0x11390) [0x7fb5a8fef390]
>  3: (gsignal()+0x38) [0x7fb5a7f7f428]
>  4: (abort()+0x16a) [0x7fb5a7f8102a]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fb5a88c284d]
>  6: (()+0x8d6b6) [0x7fb5a88c06b6]
>  7: (()+0x8d701) [0x7fb5a88c0701]
>  8: (()+0x8d919) [0x7fb5a88c0919]
>  9: (()+0x2318ad) [0x55f663f348ad]
>  10: (()+0x3e91bd) [0x55f6640ec1bd]
>  11: (DaemonPerfCounters::update(MMgrReport*)+0x821) [0x55f663f96651]
>  12: (DaemonServer::handle_report(MMgrReport*)+0x1ae) [0x55f663f9b79e]+
>  13: (DaemonServer::ms_dispatch(Message*)+0x64) [0x55f663fa8d64]
>  14: (DispatchQueue::entry()+0xf4a) [0x55f664438f3a]
>  15: (DispatchQueue::DispatchThread::entry()+0xd) [0x55f6641dc44d]
>  16: (()+0x76ba) [0x7fb5a8fe56ba]
>  17: (clone()+0x6d) [0x7fb5a80513dd]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
> --- begin dump of recent events ---
> [...]
>
>
> I tried to manually run ceph-mgr with
>> /usr/bin/ceph-mgr -f --cluster ceph --id $HOSTNAME --setuser ceph --setgroup ceph
> which immediately fails to keep running for longer than a few seconds.
> stdout: http://xor.meo.ws/OyvoZF8v0aWq0D-rOOg2y6u03fp_yzYv.txt
> logs: http://xor.meo.ws/jcMyjabCfFbTcfZ8GOangLdSfSSqJffr.txt
> objdump: http://xor.meo.ws/oxo2q8h_oKAG6q7mARvNKkR_JdYjn89B.txt
>
> Has someone seen such an issue before and knows how to debug or even fix this?
>
>
> --
> Katie
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com