Hi Mark, Thanks for your response. I did manual compaction on all osds using ceph-kvstore-tool. It reduced the number of slow ops but It didn't solve the problem completely. On Mon, Jul 26, 2021 at 8:06 PM Mark Nelson <mnelson@xxxxxxxxxx> wrote: > Yeah, I suspect that regular manual compaction might be the necessary > work around here if tombstones are slowing down iterator performance. > If it is related to tombstones, it would be similar to what we saw when > we tried to use deleterange and saw similar performance issues. > > I'm a little at a lose as to why nautilus was better (other than the > ill-fated bluefs_buffered_io change). There has been a fair amount of > code churn both in Ceph but also in rocksdb related to some of this > though. Pacific is definitely more likely to get backports for this > kind of thing IMHO. > > > Mark > > > On 7/26/21 6:19 AM, Igor Fedotov wrote: > > Unfortunately I'm not an expert in RGW hence nothing to recommend from > > that side. > > > > Apparently your issues are caused by bulk data removal - it appears > > that RocksDB can hardly sustain such things and its performance > > degrades. We've seen that plenty of times before. > > > > So far there are two known workarounds - manual DB compaction with > > using ceph-kvstore-tool and setting bluefs_buffer_io to true. The > > latter makes sense for some Ceph releases which got that parameter set > > to false by default, v15.2.12 is one of them. And indeed that setting > > might cause high RAM usage in cases - you might want to look for > > relevant recent PRs at github or ask Mark Nelson from RH for more > > details. > > > > Nevertheless current upstream recommendation/default is to have it set > > to true as it greatly improves DB performance. > > > > > > So you might want to try to compact RocksDB as per above but please > > note that's a temporary workaround - DB might start to degrade if > > removals are going on. > > > > There is also a PR to address the bulk removal issue in general: > > > > 1) https://github.com/ceph/ceph/pull/37496 (still pending review and > > unlikely to be backported to Octopus). > > > > > > One more question - do your HDD OSDs have additional fast (SSD/NVMe) > > drives for DB volumes? Or their DBs reside as spinning drives only? If > > the latter is true I would strongly encourage you to fix that by > > adding respective fast disks - RocksDB tend to works badly when not > > deployed on SSDs... > > > > > > Thanks, > > > > Igor > > > > > > On 7/26/2021 1:28 AM, mahnoosh shahidi wrote: > >> Hi Igor, > >> Thanks for your response.This problem happens on my osds with hdd > >> disks. I set the bluefs_buffered_io to true just for these osds but > >> it caused my bucket index disks (which are ssd) to produce slow ops. > >> I also tried to set bluefs_buffered_io to true in bucket index osds > >> but they filled the entire memory (256G) so I had to set the > >> bluefs_buffered_io back to false in all osds. Is that the only way to > >> handle the garbage collector problem? Do you have any ideas for the > >> bucket index problem? > >> > >> On Thu, Jul 22, 2021 at 3:37 AM Igor Fedotov <ifedotov@xxxxxxx > >> <mailto:ifedotov@xxxxxxx>> wrote: > >> > >> Hi Mahnoosh, > >> > >> you might want to set bluefs_buffered_io to true for every OSD. > >> > >> It looks it's false by default in v15.2.12 > >> > >> > >> Thanks, > >> > >> Igor > >> > >> On 7/18/2021 11:19 PM, mahnoosh shahidi wrote: > >> > We have a ceph cluster with 408 osds, 3 mons and 3 rgws. We > >> updated our > >> > cluster from nautilus 14.2.14 to octopus 15.2.12 a few days ago. > >> After > >> > upgrading, the garbage collector process which is run after the > >> lifecycle > >> > process, causes slow ops and makes some osds to be restarted. In > >> each > >> > process the garbage collector deletes about 1 million objects. > >> Below are > >> > the one of the osd's logs before it restarts. > >> > > >> > ``` > >> > 2021-07-18T00:44:38.807+0430 7fd1cda76700 1 osd.60 1092400 > >> is_healthy > >> > false -- internal heartbeat failed > >> > 2021-07-18T00:44:38.807+0430 7fd1cda76700 1 osd.60 1092400 not > >> > healthy; waiting to boot > >> > 2021-07-18T00:44:39.847+0430 7fd1cda76700 1 heartbeat_map > >> is_healthy > >> > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15 > >> > 2021-07-18T00:44:39.847+0430 7fd1cda76700 1 osd.60 1092400 > >> is_healthy > >> > false -- internal heartbeat failed > >> > 2021-07-18T00:44:39.847+0430 7fd1cda76700 1 osd.60 1092400 not > >> > healthy; waiting to boot > >> > 2021-07-18T00:44:40.895+0430 7fd1cda76700 1 heartbeat_map > >> is_healthy > >> > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15 > >> > 2021-07-18T00:44:40.895+0430 7fd1cda76700 1 osd.60 1092400 > >> is_healthy > >> > false -- internal heartbeat failed > >> > 2021-07-18T00:44:40.895+0430 7fd1cda76700 1 osd.60 1092400 not > >> > healthy; waiting to boot > >> > 2021-07-18T00:44:41.859+0430 7fd1cda76700 1 heartbeat_map > >> is_healthy > >> > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15 > >> > 2021-07-18T00:44:41.859+0430 7fd1cda76700 1 osd.60 1092400 > >> is_healthy > >> > false -- internal heartbeat failed > >> > 2021-07-18T00:44:41.859+0430 7fd1cda76700 1 osd.60 1092400 not > >> > healthy; waiting to boot > >> > 2021-07-18T00:44:42.811+0430 7fd1cda76700 1 heartbeat_map > >> is_healthy > >> > 'OSD::osd_op_tp thread 0x7fd1b4243700' had timed out after 15 > >> > 2021-07-18T00:44:42.811+0430 7fd1cda76700 1 osd.60 1092400 > >> is_healthy > >> > false -- internal heartbeat failed > >> > > >> > ``` > >> > what is the suitable configuration for gc in such a heavy delete > >> process so > >> > it doesn't make slow ops? We had the same delete load in > >> nautilus but we > >> > didn't have any problem with that. > >> > _______________________________________________ > >> > ceph-users mailing list -- ceph-users@xxxxxxx > >> <mailto:ceph-users@xxxxxxx> > >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> <mailto:ceph-users-leave@xxxxxxx> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> <mailto:ceph-users@xxxxxxx> > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> <mailto:ceph-users-leave@xxxxxxx> > >> > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx