Re: octopus garbage collector makes slow ops

Igor Fedotov <ifedotov@xxxxxxx> · Thu, 22 Jul 2021 02:07:19 +0300

Hi Mahnoosh,

you might want to set bluefs_buffered_io to true for every OSD.

It looks it's false by default in v15.2.12

Thanks,

Igor

On 7/18/2021 11:19 PM, mahnoosh shahidi wrote:
We have a ceph cluster with 408 osds, 3 mons and 3 rgws. We updated our
cluster from nautilus 14.2.14 to octopus 15.2.12 a few days ago. After
upgrading, the garbage collector process which is run after the lifecycle
process, causes slow ops and makes some osds to be restarted. In each
process the garbage collector deletes about 1 million objects. Below are
the one of the osd's logs before it restarts.

```
2021-07-18T00:44:38.807+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
false -- internal heartbeat failed
2021-07-18T00:44:38.807+0430 7fd1cda76700  1 osd.60 1092400 not
healthy; waiting to boot
2021-07-18T00:44:39.847+0430 7fd1cda76700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
2021-07-18T00:44:39.847+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
false -- internal heartbeat failed
2021-07-18T00:44:39.847+0430 7fd1cda76700  1 osd.60 1092400 not
healthy; waiting to boot
2021-07-18T00:44:40.895+0430 7fd1cda76700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
2021-07-18T00:44:40.895+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
false -- internal heartbeat failed
2021-07-18T00:44:40.895+0430 7fd1cda76700  1 osd.60 1092400 not
healthy; waiting to boot
2021-07-18T00:44:41.859+0430 7fd1cda76700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
2021-07-18T00:44:41.859+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
false -- internal heartbeat failed
2021-07-18T00:44:41.859+0430 7fd1cda76700  1 osd.60 1092400 not
healthy; waiting to boot
2021-07-18T00:44:42.811+0430 7fd1cda76700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
2021-07-18T00:44:42.811+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
false -- internal heartbeat failed

```
what is the suitable configuration for gc in such a heavy delete process so
it doesn't make slow ops? We had the same delete load in nautilus but we
didn't have any problem with that.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx