Re: OSD fast shutdown and allocator''s map restoration (aka NCB)

Sage Weil <sage@xxxxxxxxxxxx> · Mon, 22 Nov 2021 11:54:52 -0600

Hi Igor,
Good point!  I forgot about the intersection of these two features. 

(Just to confirm: the allocator change is new in quincy, right?  And won't be backported?  Just want to confirm this is affecting any users.)

The fast shutdown was introduced partly because in Octopus we want to set the dead_epoch value in the OSDMap, and at the time the quickest way to do that was to kill the process so that peers would get ECONNREFUSED and report the peer dead.  

Given this new feature, I think it makes sense to change the clean shutdown process to (1) stop responding to messages immediately (play dead) and (2) ask the mon to mark us dead (MOSDMarkMeDead instead of MOSDMarkMeDown).  I think that's a matter of dropping incoming messages when we are in PREPARING_TO_STOP state or whatever it is.  Note that it is very important that we stop processing requests *before* we are marked dead in order to prevent potentially stale reads--"dead" means the OSD process is truly dead (vs unresponsive/slow or possibly partitioned away from us but still serving reads for some clients).

Anyway, this relates to https://tracker.ceph.com/issues/53327.  I plan to take a closer look this week.

sage

On Mon, Nov 22, 2021 at 11:44 AM Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:
Hey folks,

recently I realized that OSD's fast shutdown (which is a default 

behavior) results  our new feature - dynamic allocator's map restoration 

- in being working in a suboptimal mode. Due to nongracefull shutdown it 

has to recover allocator's map through onode enumeration on each OSD 

startup. Which might apparently take some time. Moreover RocksDB 

apparently performs a sort of recovery in this case too - may be not 

that long but still visible.

Please also note that one might miss the above issues when using 

vstart.sh - it has got osd_fast_shutdown set to false.

I created the following ticket to track the issue: 

https://tracker.ceph.com/issues/53266

Additionally we've already made some additional tricks in the code for 

this fast shutdown mode, e.g. osd_fast_shutdown_notify_mon_option.

Hence given the above shouldn't we revise the need for this fast 

shutdown feature? IIUC the presense  of various bugs along the regular 

shutdown path was one of the primary rationales for new mode 

introduction. But IMO properly running graceful shutdown is a sort of 

code's quality mark. And aren't we just moving the complexity/burden 

from shutdown procedure to the startup one this way? So may be we better 

invest in making shutdown clean enough?

Thanks,

-- 

Igor Fedotov

Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich

CEO: Martin Verges - VAT-ID: DE310638492

Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx