Hi Markus, Yes, I think you should open a bug tracker with more from a crashing osd log file (e.g. all the -1> -2> etc. lines before the crash) and also from the mon leader if possible. Something strange is that the mon_warn_on_pool_pg_num_not_power_of_two feature is also present in v14.2.9 (it was added in v14.2.8). Which version did you upgrade from? Perhaps setting it to false was the trigger, but the crash is somewhere else in the OSD changes in v14.2.10. Cheers, Dan On Wed, Jul 1, 2020 at 9:09 AM Markus Binz <mbinz@xxxxxxxxx> wrote: > > Hello, > > yesterday we upgraded a mimic cluster to v14.2.10, everything was running and ok. > > There was this new warning, 2 pool(s) have non-power-of-two pg_num and to get a HEALTH_OK state until we can expand this pools, > i found this config option to suppress the warning: > > ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false which resulted in a crash of 40 osd processes (about 60% of the cluster). > > no restart possible, always the same crash. > > 2020-06-30 21:13:56.179 7fd2b7708c00 -1 osd.30 385679 log_to_monitors {default=true} > *** Caught signal (Segmentation fault) ** > in thread 7fd2a5813700 thread_name:fn_odsk_fstore > ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable) > 1: (()+0x11390) [0x7fd2b53a3390] > 2: /usr/bin/ceph-osd() [0x87fd12] > 3: (OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*)+0x5e1) [0x8f0f91] > 4: (C_OnMapCommit::finish(int)+0x17) [0x946897] > 5: (Context::complete(int)+0x9) [0x8fbfb9] > 6: (Finisher::finisher_thread_entry()+0x15e) [0xeb2b8e] > 7: (()+0x76ba) [0x7fd2b53996ba] > 8: (clone()+0x6d) [0x7fd2b49a041d] > 2020-06-30 21:13:56.199 7fd2a5813700 -1 *** Caught signal (Segmentation fault) ** > in thread 7fd2a5813700 thread_name:fn_odsk_fstore > > ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable) > 1: (()+0x11390) [0x7fd2b53a3390] > 2: /usr/bin/ceph-osd() [0x87fd12] > 3: (OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*)+0x5e1) [0x8f0f91] > 4: (C_OnMapCommit::finish(int)+0x17) [0x946897] > 5: (Context::complete(int)+0x9) [0x8fbfb9] > 6: (Finisher::finisher_thread_entry()+0x15e) [0xeb2b8e] > 7: (()+0x76ba) [0x7fd2b53996ba] > 8: (clone()+0x6d) [0x7fd2b49a041d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > -1547> 2020-06-30 21:13:51.171 7fd2b7708c00 -1 missing 'type' file, inferring filestore from current/ dir > -738> 2020-06-30 21:13:56.179 7fd2b7708c00 -1 osd.30 385679 log_to_monitors {default=true} > 0> 2020-06-30 21:13:56.199 7fd2a5813700 -1 *** Caught signal (Segmentation fault) ** > in thread 7fd2a5813700 thread_name:fn_odsk_fstore > > ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable) > 1: (()+0x11390) [0x7fd2b53a3390] > 2: /usr/bin/ceph-osd() [0x87fd12] > 3: (OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*)+0x5e1) [0x8f0f91] > 4: (C_OnMapCommit::finish(int)+0x17) [0x946897] > 5: (Context::complete(int)+0x9) [0x8fbfb9] > 6: (Finisher::finisher_thread_entry()+0x15e) [0xeb2b8e] > 7: (()+0x76ba) [0x7fd2b53996ba] > 8: (clone()+0x6d) [0x7fd2b49a041d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > -1547> 2020-06-30 21:13:51.171 7fd2b7708c00 -1 missing 'type' file, inferring filestore from current/ dir > -738> 2020-06-30 21:13:56.179 7fd2b7708c00 -1 osd.30 385679 log_to_monitors {default=true} > 0> 2020-06-30 21:13:56.199 7fd2a5813700 -1 *** Caught signal (Segmentation fault) ** > in thread 7fd2a5813700 thread_name:fn_odsk_fstore > > ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable) > 1: (()+0x11390) [0x7fd2b53a3390] > 2: /usr/bin/ceph-osd() [0x87fd12] > 3: (OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*)+0x5e1) [0x8f0f91] > 4: (C_OnMapCommit::finish(int)+0x17) [0x946897] > 5: (Context::complete(int)+0x9) [0x8fbfb9] > 6: (Finisher::finisher_thread_entry()+0x15e) [0xeb2b8e] > 7: (()+0x76ba) [0x7fd2b53996ba] > 8: (clone()+0x6d) [0x7fd2b49a041d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > This is a mixed cluster of ubuntu xenial and bionic, it happens on both. > > It look's like, it happens when the new monmap arrived at the osd. > > The only fix i was able to come up with, downgrade ceph-osd to v14.2.9. > > Should i open a bug report? > > Regards > > Markus > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx