Re: v14.2.10 Nautilus crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Markus,

Yes, I think you should open a bug tracker with more from a crashing
osd log file (e.g. all the -1> -2> etc. lines before the crash) and
also from the mon leader if possible.

Something strange is that the mon_warn_on_pool_pg_num_not_power_of_two
feature is also present in v14.2.9 (it was added in v14.2.8). Which
version did you upgrade from? Perhaps setting it to false was the
trigger, but the crash is somewhere else in the OSD changes in
v14.2.10.

Cheers, Dan


On Wed, Jul 1, 2020 at 9:09 AM Markus Binz <mbinz@xxxxxxxxx> wrote:
>
> Hello,
>
> yesterday we upgraded a mimic cluster to v14.2.10, everything was running and ok.
>
> There was this new warning, 2 pool(s) have non-power-of-two pg_num and to get a HEALTH_OK state until we can expand this pools,
> i found this config option to suppress the warning:
>
> ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false which resulted in a crash of 40 osd processes (about 60% of the cluster).
>
> no restart possible, always the same crash.
>
> 2020-06-30 21:13:56.179 7fd2b7708c00 -1 osd.30 385679 log_to_monitors {default=true}
> *** Caught signal (Segmentation fault) **
>   in thread 7fd2a5813700 thread_name:fn_odsk_fstore
>   ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
>   1: (()+0x11390) [0x7fd2b53a3390]
>   2: /usr/bin/ceph-osd() [0x87fd12]
>   3: (OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*)+0x5e1) [0x8f0f91]
>   4: (C_OnMapCommit::finish(int)+0x17) [0x946897]
>   5: (Context::complete(int)+0x9) [0x8fbfb9]
>   6: (Finisher::finisher_thread_entry()+0x15e) [0xeb2b8e]
>   7: (()+0x76ba) [0x7fd2b53996ba]
>   8: (clone()+0x6d) [0x7fd2b49a041d]
> 2020-06-30 21:13:56.199 7fd2a5813700 -1 *** Caught signal (Segmentation fault) **
>   in thread 7fd2a5813700 thread_name:fn_odsk_fstore
>
>   ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
>   1: (()+0x11390) [0x7fd2b53a3390]
>   2: /usr/bin/ceph-osd() [0x87fd12]
>   3: (OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*)+0x5e1) [0x8f0f91]
>   4: (C_OnMapCommit::finish(int)+0x17) [0x946897]
>   5: (Context::complete(int)+0x9) [0x8fbfb9]
>   6: (Finisher::finisher_thread_entry()+0x15e) [0xeb2b8e]
>   7: (()+0x76ba) [0x7fd2b53996ba]
>   8: (clone()+0x6d) [0x7fd2b49a041d]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
>   -1547> 2020-06-30 21:13:51.171 7fd2b7708c00 -1 missing 'type' file, inferring filestore from current/ dir
>    -738> 2020-06-30 21:13:56.179 7fd2b7708c00 -1 osd.30 385679 log_to_monitors {default=true}
>       0> 2020-06-30 21:13:56.199 7fd2a5813700 -1 *** Caught signal (Segmentation fault) **
>   in thread 7fd2a5813700 thread_name:fn_odsk_fstore
>
>   ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
>   1: (()+0x11390) [0x7fd2b53a3390]
>   2: /usr/bin/ceph-osd() [0x87fd12]
>   3: (OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*)+0x5e1) [0x8f0f91]
>   4: (C_OnMapCommit::finish(int)+0x17) [0x946897]
>   5: (Context::complete(int)+0x9) [0x8fbfb9]
>   6: (Finisher::finisher_thread_entry()+0x15e) [0xeb2b8e]
>   7: (()+0x76ba) [0x7fd2b53996ba]
>   8: (clone()+0x6d) [0x7fd2b49a041d]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
>   -1547> 2020-06-30 21:13:51.171 7fd2b7708c00 -1 missing 'type' file, inferring filestore from current/ dir
>    -738> 2020-06-30 21:13:56.179 7fd2b7708c00 -1 osd.30 385679 log_to_monitors {default=true}
>       0> 2020-06-30 21:13:56.199 7fd2a5813700 -1 *** Caught signal (Segmentation fault) **
>   in thread 7fd2a5813700 thread_name:fn_odsk_fstore
>
>   ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
>   1: (()+0x11390) [0x7fd2b53a3390]
>   2: /usr/bin/ceph-osd() [0x87fd12]
>   3: (OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*)+0x5e1) [0x8f0f91]
>   4: (C_OnMapCommit::finish(int)+0x17) [0x946897]
>   5: (Context::complete(int)+0x9) [0x8fbfb9]
>   6: (Finisher::finisher_thread_entry()+0x15e) [0xeb2b8e]
>   7: (()+0x76ba) [0x7fd2b53996ba]
>   8: (clone()+0x6d) [0x7fd2b49a041d]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
> This is a mixed cluster of ubuntu xenial and bionic, it happens on both.
>
> It look's like, it happens when the new monmap arrived at the osd.
>
> The only fix i was able to come up with, downgrade ceph-osd to v14.2.9.
>
> Should i open a bug report?
>
> Regards
>
> Markus
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux