Re: octopus garbage collector makes slow ops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mark,
Thanks for your response. I did manual compaction on all osds using
ceph-kvstore-tool. It reduced the number of slow ops but It didn't solve
the problem completely.

On Mon, Jul 26, 2021 at 8:06 PM Mark Nelson <mnelson@xxxxxxxxxx> wrote:

> Yeah, I suspect that regular manual compaction might be the necessary
> work around here if tombstones are slowing down iterator performance.
> If it is related to tombstones, it would be similar to what we saw when
> we tried to use deleterange and saw similar performance issues.
>
> I'm a little at a lose as to why nautilus was better (other than the
> ill-fated bluefs_buffered_io change).  There has  been a fair amount of
> code churn both in Ceph but also in rocksdb related to some of this
> though.  Pacific is definitely more likely to get backports for this
> kind of thing IMHO.
>
>
> Mark
>
>
> On 7/26/21 6:19 AM, Igor Fedotov wrote:
> > Unfortunately I'm not an expert in RGW hence nothing to recommend from
> > that side.
> >
> > Apparently your issues are caused by bulk data removal - it appears
> > that RocksDB can hardly sustain such things and its performance
> > degrades. We've seen that plenty of times before.
> >
> > So far there are two known workarounds - manual DB compaction with
> > using ceph-kvstore-tool and setting bluefs_buffer_io to true. The
> > latter makes sense for some Ceph releases which got that parameter set
> > to false by default, v15.2.12 is one of them. And indeed that setting
> > might cause high RAM usage in cases - you might want to look for
> > relevant recent PRs at github or ask Mark Nelson from RH for more
> > details.
> >
> > Nevertheless current upstream recommendation/default is to have it set
> > to true as it greatly improves DB performance.
> >
> >
> > So you might want to try to compact RocksDB as per above but please
> > note that's a temporary workaround - DB might start to degrade if
> > removals are going on.
> >
> > There is also a PR to address the bulk removal issue in general:
> >
> > 1) https://github.com/ceph/ceph/pull/37496 (still pending review and
> > unlikely to be backported to Octopus).
> >
> >
> > One more question - do your HDD OSDs  have additional fast (SSD/NVMe)
> > drives for DB volumes? Or their DBs reside as spinning drives only? If
> > the latter is true I would strongly encourage you to fix that by
> > adding respective fast disks - RocksDB tend to works badly when not
> > deployed on SSDs...
> >
> >
> > Thanks,
> >
> > Igor
> >
> >
> > On 7/26/2021 1:28 AM, mahnoosh shahidi wrote:
> >> Hi Igor,
> >> Thanks for your response.This problem happens on my osds with hdd
> >> disks. I set the bluefs_buffered_io to true just for these osds but
> >> it caused my bucket index disks (which are ssd) to produce slow ops.
> >> I also tried to set bluefs_buffered_io to true in bucket index osds
> >> but they filled the entire memory (256G) so I had to set the
> >> bluefs_buffered_io back to false in all osds. Is that the only way to
> >> handle the garbage collector problem? Do you have any ideas for the
> >> bucket index problem?
> >>
> >> On Thu, Jul 22, 2021 at 3:37 AM Igor Fedotov <ifedotov@xxxxxxx
> >> <mailto:ifedotov@xxxxxxx>> wrote:
> >>
> >>     Hi Mahnoosh,
> >>
> >>     you might want to set bluefs_buffered_io to true for every OSD.
> >>
> >>     It looks it's false by default in v15.2.12
> >>
> >>
> >>     Thanks,
> >>
> >>     Igor
> >>
> >>     On 7/18/2021 11:19 PM, mahnoosh shahidi wrote:
> >>     > We have a ceph cluster with 408 osds, 3 mons and 3 rgws. We
> >>     updated our
> >>     > cluster from nautilus 14.2.14 to octopus 15.2.12 a few days ago.
> >>     After
> >>     > upgrading, the garbage collector process which is run after the
> >>     lifecycle
> >>     > process, causes slow ops and makes some osds to be restarted. In
> >>     each
> >>     > process the garbage collector deletes about 1 million objects.
> >>     Below are
> >>     > the one of the osd's logs before it restarts.
> >>     >
> >>     > ```
> >>     > 2021-07-18T00:44:38.807+0430 7fd1cda76700  1 osd.60 1092400
> >>     is_healthy
> >>     > false -- internal heartbeat failed
> >>     > 2021-07-18T00:44:38.807+0430 7fd1cda76700  1 osd.60 1092400 not
> >>     > healthy; waiting to boot
> >>     > 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 heartbeat_map
> >>     is_healthy
> >>     > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> >>     > 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 osd.60 1092400
> >>     is_healthy
> >>     > false -- internal heartbeat failed
> >>     > 2021-07-18T00:44:39.847+0430 7fd1cda76700  1 osd.60 1092400 not
> >>     > healthy; waiting to boot
> >>     > 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 heartbeat_map
> >>     is_healthy
> >>     > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> >>     > 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 osd.60 1092400
> >>     is_healthy
> >>     > false -- internal heartbeat failed
> >>     > 2021-07-18T00:44:40.895+0430 7fd1cda76700  1 osd.60 1092400 not
> >>     > healthy; waiting to boot
> >>     > 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 heartbeat_map
> >>     is_healthy
> >>     > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> >>     > 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 osd.60 1092400
> >>     is_healthy
> >>     > false -- internal heartbeat failed
> >>     > 2021-07-18T00:44:41.859+0430 7fd1cda76700  1 osd.60 1092400 not
> >>     > healthy; waiting to boot
> >>     > 2021-07-18T00:44:42.811+0430 7fd1cda76700  1 heartbeat_map
> >>     is_healthy
> >>     > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15
> >>     > 2021-07-18T00:44:42.811+0430 7fd1cda76700  1 osd.60 1092400
> >>     is_healthy
> >>     > false -- internal heartbeat failed
> >>     >
> >>     > ```
> >>     > what is the suitable configuration for gc in such a heavy delete
> >>     process so
> >>     > it doesn't make slow ops? We had the same delete load in
> >>     nautilus but we
> >>     > didn't have any problem with that.
> >>     > _______________________________________________
> >>     > ceph-users mailing list -- ceph-users@xxxxxxx
> >>     <mailto:ceph-users@xxxxxxx>
> >>     > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>     <mailto:ceph-users-leave@xxxxxxx>
> >>     _______________________________________________
> >>     ceph-users mailing list -- ceph-users@xxxxxxx
> >>     <mailto:ceph-users@xxxxxxx>
> >>     To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>     <mailto:ceph-users-leave@xxxxxxx>
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux