Re: ceph IO are interrupted when OSD goes down

Eugen Block <eblock@xxxxxx> · Mon, 18 Oct 2021 13:55:43 +0000

Well, the default is k + 1, so 11. Could it be that you reduced it  
during a recovery phase but didn't set it back to the default?

Zitat von denispolom@xxxxxxxxx:

no, disks utilization is around 86%.

What is safe value for min_size in this case?

18. 10. 2021 15:46:44 Eugen Block <eblock@xxxxxx>:

Hi,

min_size = k is not the safest option, it should be only used in  
case  of disaster recovery. But in this case it's not related to  
IO  interruption, it seems. Are some disks utilized around 100%  
(iostat)  when this happens?

Zitat von Denis Polom <denispolom@xxxxxxxxx>:

Hi,

it's

min_size: 10

On 10/18/21 14:43, Eugen Block wrote:
What is your min_size for the affected pool?

Zitat von Denis Polom <denispolom@xxxxxxxxx>:

Hi,

I have 18 OSD nodes in this cluster. And it does happen even if   
one OSD daemon goes down or flaps.

Running

ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be)   
octopus (stable)

thx!

On 10/18/21 12:12, Eugen Block wrote:
Hi,

with this EC setup your pool min_size would be 11 (k+1), so in   
case one host goes down (or several OSDs fail on this host),  
your  clients should not be affected. But as soon as a second  
host  fails you’ll notice IO pause until at least one host has   
recovered. Do you have more than 12 hosts in this cluster so  
it  could recover one host failure?

Regards,
Eugen

Zitat von Denis Polom <denispolom@xxxxxxxxx>:

…

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx