Hi Istvan, Yeah, it's a been a known bug. Thus my previous recommendation to first upgrade your cluster a more recent version of Ceph and then only reshard the mult-site synced bucket. Cheers, Frédéric. ----- Le 29 Nov 24, à 13:11, Istvan Szabo, Agoda <Istvan.Szabo@xxxxxxxxx> a écrit : > The reshard topic is running on quincy 17.2.7, but tested today the reshard, > objects gone. > Istvan > From: Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> > Sent: Friday, November 29, 2024 5:17:27 PM > To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx> > Cc: Ceph Users <ceph-users@xxxxxxx> > Subject: Re: Snaptriming speed degrade with pg increase > Email received from the internet. If in doubt, don't click any link nor open any > attachment ! > ----- Le 29 Nov 24, à 11:11, Istvan Szabo, Agoda <Istvan.Szabo@xxxxxxxxx> a > écrit : >> Increased from 9 servers to 11 so let's say 20% capacity and performance added. >> This is a different cluster purely rbd. > I see, so big objects. You might want to increase osd_max_trimming_pgs and > eventually osd_pg_max_concurrent_snap_trims and see how it goes. >> (For the other topic can't be resharded because in multisite it will disappear >> all the data disappear on remote site, need to create new bucket and migrate >> data first to a higher sharded bucket). > Hum... You have fallen significantly behind on Ceph versions, which must be > hindering you in many operational tasks today. Another option would be to catch > up and reshard into a recent version in multi-site mode. > Frédéric. >> Istvan >> From: Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> >> Sent: Friday, November 29, 2024 4:58:52 PM >> To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx> >> Cc: Ceph Users <ceph-users@xxxxxxx> >> Subject: Re: Snaptriming speed degrade with pg increase >> Email received from the internet. If in doubt, don't click any link nor open any >> attachment ! >> ________________________________ >> Hi Istvan, >> Did the PG split involved using more OSDs than before? If so then increasing >> these values (apart from the sleep) should not have a negative impact on >> clients I/O compared to before the split and should accelerate the whole >> process. >> Did you reshard the buckets as discussed in the other thread? >> Regards, >> Frédéric. >> ----- Le 29 Nov 24, à 3:30, Istvan Szabo, Agoda Istvan.Szabo@xxxxxxxxx a écrit : >> > Hi, >> > When we scale the placement group on a pool located in a full nvme cluster, the >> > snaptriming speed degrades a lot. >> > Currently we are running with these values to not degrade client op and have >> > some progress on snaptrimmin, but it is terrible. (octopus 15.2.17 on ubuntu >> > 20.04) >> > -osd_max_trimming_pgs=2 >> > --osd_snap_trim_sleep=0.1 >> > --osd_pg_max_concurrent_snap_trims=2 >> > We had a big pool which we used to have 128PG and that length of the >> > snaptrimming took around 45-60 minutes. >> > Due to impossible to do maintenance on the cluster with 600GB pg sizes because >> > it can easily max out a cluster (which we did), we increased to 1024 and the >> > snaptrimming duration increased to 3.5 hours. >> > Is there any good solution that we are missing to fix this? >> > On the hardware level I've changed server profile to tune some numa settings but >> > seems like didn't help still. >> > Thank you >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@xxxxxxx >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx