Re: octopus garbage collector makes slow ops

mahnoosh shahidi <mahnooosh.shd@xxxxxxxxx> · Mon, 26 Jul 2021 02:58:15 +0430

Hi Igor,
Thanks for your response.This problem happens on my osds with hdd disks. I
set the bluefs_buffered_io to true just for these osds but it caused my
bucket index disks (which are ssd) to produce slow ops. I also tried to set
bluefs_buffered_io to true in bucket index osds but they filled the entire
memory (256G) so I had to set the bluefs_buffered_io back to false in all
osds. Is that the only way to handle the garbage collector problem? Do you
have any ideas for the bucket index problem?

On Thu, Jul 22, 2021 at 3:37 AM Igor Fedotov <ifedotov@xxxxxxx> wrote:

> Hi Mahnoosh,
>
> you might want to set bluefs_buffered_io to true for every OSD.
>
> It looks it's false by default in v15.2.12
>
>
> Thanks,
>
> Igor
>
> On 7/18/2021 11:19 PM, mahnoosh shahidi wrote:
> > We have a ceph cluster with 408 osds, 3 mons and 3 rgws. We updated our
> > cluster from nautilus 14.2.14 to octopus 15.2.12 a few days ago. After
> > upgrading, the garbage collector process which is run after the lifecycle
> > process, causes slow ops and makes some osds to be restarted. In each
> > process the garbage collector deletes about 1 million objects. Below are
> > the one of the osd's logs before it restarts.
> >
> > ```
> > 2021-07-18T00:44:38.807+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
> > false -- internal heartbeat failed
> > 2021-07-18T00:44:38.807+0430 7fd1cda76700  1 osd.60 1092400 not
> > healthy; waiting to boot
> > 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 heartbeat_map is_healthy
> > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> > 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
> > false -- internal heartbeat failed
> > 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 osd.60 1092400 not
> > healthy; waiting to boot
> > 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 heartbeat_map is_healthy
> > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> > 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
> > false -- internal heartbeat failed
> > 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 osd.60 1092400 not
> > healthy; waiting to boot
> > 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 heartbeat_map is_healthy
> > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> > 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
> > false -- internal heartbeat failed
> > 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 osd.60 1092400 not
> > healthy; waiting to boot
> > 2021-07-18T00:44:42.811+0430 7fd1cda76700  1 heartbeat_map is_healthy
> > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> > 2021-07-18T00:44:42.811+0430 7fd1cda76700  1 osd.60 1092400 is_healthy
> > false -- internal heartbeat failed
> >
> > ```
> > what is the suitable configuration for gc in such a heavy delete process
> so
> > it doesn't make slow ops? We had the same delete load in nautilus but we
> > didn't have any problem with that.
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx