Re: HA and data recovery of CEPH

Aleksey Gutikov <aleksey.gutikov@xxxxxxxxxx> · Tue, 3 Dec 2019 17:07:05 +0300

That is true. When an OSD goes down it will take a few seconds for it's
Placement Groups to re-peer with the other OSDs. During that period
writes to those PGs will stall for a couple of seconds.

I wouldn't say it's 40s, but it can take ~10s.

Hello,

According to my experience, in case of OSD crashes, killed -9 (any kind 
abnormat termination) OSD failure handling contains next steps:
1) Failed OSD's peers detect that it does not respond - it can take up 
to osd_heartbeat_grace + osd_heartbeat_interval seconds
2) Peers send reports to monitor
3) Monitor makes a decision according to (options from it's own config) 
mon_osd_adjust_heartbeat_grace, osd_heartbeat_grace, 
mon_osd_laggy_halflife, mon_osd_min_down_reporters, ... And finally mark 
OSD down in osdmap.
4) Monitor send updated OSDmap to OSDs and clients
5) OSDs starting peering
5.1) Peering itself is complicated process, for example we had 
experienced PGs stuck in inactive state due to 
osd_max_pg_per_osd_hard_ratio.
6) Peering finished (PGs' data continue moving) - clients can normally 
access affected PGs. Clients also have their own timeouts that can 
affect time to recover.

Again, according to my experience, 40s with default settings is possible.

--

Best regards!
Aleksei Gutikov
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com