Hi Igor,
Good point! I forgot about the intersection of these two features.
(Just to confirm: the allocator change is new in quincy, right? And won't be backported? Just want to confirm this is affecting any users.)
The fast shutdown was introduced partly because in Octopus we want to set the dead_epoch value in the OSDMap, and at the time the quickest way to do that was to kill the process so that peers would get ECONNREFUSED and report the peer dead.
Given this new feature, I think it makes sense to change the clean shutdown process to (1) stop responding to messages immediately (play dead) and (2) ask the mon to mark us dead (MOSDMarkMeDead instead of MOSDMarkMeDown). I think that's a matter of dropping incoming messages when we are in PREPARING_TO_STOP state or whatever it is. Note that it is very important that we stop processing requests *before* we are marked dead in order to prevent potentially stale reads--"dead" means the OSD process is truly dead (vs unresponsive/slow or possibly partitioned away from us but still serving reads for some clients).
Anyway, this relates to https://tracker.ceph.com/issues/53327. I plan to take a closer look this week.
sage
On Mon, Nov 22, 2021 at 11:44 AM Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:
Hey folks,
recently I realized that OSD's fast shutdown (which is a default
behavior) results our new feature - dynamic allocator's map restoration
- in being working in a suboptimal mode. Due to nongracefull shutdown it
has to recover allocator's map through onode enumeration on each OSD
startup. Which might apparently take some time. Moreover RocksDB
apparently performs a sort of recovery in this case too - may be not
that long but still visible.
Please also note that one might miss the above issues when using
vstart.sh - it has got osd_fast_shutdown set to false.
I created the following ticket to track the issue:
https://tracker.ceph.com/issues/53266
Additionally we've already made some additional tricks in the code for
this fast shutdown mode, e.g. osd_fast_shutdown_notify_mon_option.
Hence given the above shouldn't we revise the need for this fast
shutdown feature? IIUC the presense of various bugs along the regular
shutdown path was one of the primary rationales for new mode
introduction. But IMO properly running graceful shutdown is a sort of
code's quality mark. And aren't we just moving the complexity/burden
from shutdown procedure to the startup one this way? So may be we better
invest in making shutdown clean enough?
Thanks,
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx