Re: ceph IO are interrupted when OSD goes down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

sorry for the delay. So no, the min_size is not the issue here. Is the 86% utilization an average or does it spike to 100% during the interruptions? Does ceph report slow requests? Have you questioned the osd daemon which operations took so long with

ceph daemon osd.1 dump_historic_slow_ops

or

ceph daemon osd.1 dump_historic_ops_by_duration

Does the cluster (or OSDs) report anything during those interruptions?


Zitat von Denis Polom <denispolom@xxxxxxxxx>:

No it's actually not. It's by design by colleague of mine.  But anyway it's not related to this issue.

On 10/18/21 15:55, Eugen Block wrote:
Well, the default is k + 1, so 11. Could it be that you reduced it during a recovery phase but didn't set it back to the default?


Zitat von denispolom@xxxxxxxxx:

no, disks utilization is around 86%.

What is safe value for min_size in this case?

18. 10. 2021 15:46:44 Eugen Block <eblock@xxxxxx>:

Hi,

min_size = k is not the safest option, it should be only used in case  of disaster recovery. But in this case it's not related to IO  interruption, it seems. Are some disks utilized around 100% (iostat)  when this happens?


Zitat von Denis Polom <denispolom@xxxxxxxxx>:

Hi,

it's

min_size: 10


On 10/18/21 14:43, Eugen Block wrote:
What is your min_size for the affected pool?


Zitat von Denis Polom <denispolom@xxxxxxxxx>:

Hi,

I have 18 OSD nodes in this cluster. And it does happen even if  one OSD daemon goes down or flaps.

Running

ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be)  octopus (stable)

thx!


On 10/18/21 12:12, Eugen Block wrote:
Hi,

with this EC setup your pool min_size would be 11 (k+1), so in  case one host goes down (or several OSDs fail on this host), your  clients should not be affected. But as soon as a second host  fails you’ll notice IO pause until at least one host has recovered. Do you have more than 12 hosts in this cluster so it  could recover one host failure?

Regards,
Eugen


Zitat von Denis Polom <denispolom@xxxxxxxxx>:










_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux