Re: HA and data recovery of CEPH

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks to all, now we can make that duration to 25 seconds around, this is the best result as we can.

BR

On Tue, Dec 3, 2019 at 10:30 PM Wido den Hollander <wido@xxxxxxxx> wrote:


On 12/3/19 3:07 PM, Aleksey Gutikov wrote:
>
>> That is true. When an OSD goes down it will take a few seconds for it's
>> Placement Groups to re-peer with the other OSDs. During that period
>> writes to those PGs will stall for a couple of seconds.
>>
>> I wouldn't say it's 40s, but it can take ~10s.
>
> Hello,
>
> According to my experience, in case of OSD crashes, killed -9 (any kind
> abnormat termination) OSD failure handling contains next steps:
> 1) Failed OSD's peers detect that it does not respond - it can take up
> to osd_heartbeat_grace + osd_heartbeat_interval seconds

If a 'Connection Refused' is detected the OSD will be marked as down
right away.

> 2) Peers send reports to monitor
> 3) Monitor makes a decision according to (options from it's own config)
> mon_osd_adjust_heartbeat_grace, osd_heartbeat_grace,
> mon_osd_laggy_halflife, mon_osd_min_down_reporters, ... And finally mark
> OSD down in osdmap.

True.

> 4) Monitor send updated OSDmap to OSDs and clients
> 5) OSDs starting peering
> 5.1) Peering itself is complicated process, for example we had
> experienced PGs stuck in inactive state due to
> osd_max_pg_per_osd_hard_ratio.

I would say that 5.1 isn't relevant for most cases. Yes, it can happen,
but it's rare.

> 6) Peering finished (PGs' data continue moving) - clients can normally
> access affected PGs. Clients also have their own timeouts that can
> affect time to recover. >
> Again, according to my experience, 40s with default settings is possible.
>

40s is possible in certain scenarios. But I wouldn't say that's the
default for all cases.

Wido

>


--
The modern Unified Communications provider

https://www.portsip.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux