(adding back the list) On Tue, Mar 21, 2023 at 11:25 AM Joachim Kraftmayer < joachim.kraftmayer@xxxxxxxxx> wrote: > i added the questions and answers below. > > ___________________________________ > Best Regards, > Joachim Kraftmayer > CEO | Clyso GmbH > > Clyso GmbH > p: +49 89 21 55 23 91 2 > a: Loristraße 8 | 80335 München | Germany > w: https://clyso.com | e: joachim.kraftmayer@xxxxxxxxx > > We are hiring: https://www.clyso.com/jobs/ > --- > CEO: Dipl. Inf. (FH) Joachim Kraftmayer > Unternehmenssitz: Utting am Ammersee > Handelsregister beim Amtsgericht: Augsburg > Handelsregister-Nummer: HRB 25866 > USt. ID-Nr.: DE275430677 > > Am 21.03.23 um 11:14 schrieb Gauvain Pocentek: > > Hi Joachim, > > > On Tue, Mar 21, 2023 at 10:13 AM Joachim Kraftmayer < > joachim.kraftmayer@xxxxxxxxx> wrote: > >> Which Ceph version are you running, is mclock active? >> >> > We're using Quincy (17.2.5), upgraded step by step from Luminous if I > remember correctly. > > did you recreate the osds? if yes, at which version? > I actually don't remember all the history, but I think we added the HDD nodes while running Pacific. > > mlock seems active, set to high_client_ops profile. HDD OSDs have very > different settings for max capacity iops: > > osd.137 basic osd_mclock_max_capacity_iops_hdd > 929.763899 > osd.161 basic osd_mclock_max_capacity_iops_hdd > 4754.250946 > osd.222 basic osd_mclock_max_capacity_iops_hdd > 540.016984 > osd.281 basic osd_mclock_max_capacity_iops_hdd > 1029.193945 > osd.282 basic osd_mclock_max_capacity_iops_hdd > 1061.762870 > osd.283 basic osd_mclock_max_capacity_iops_hdd > 462.984562 > > We haven't set those explicitly, could they be the reason of the slow > recovery? > > i recommend to disable mclock for now, and yes we have seen slow recovery > caused by mclock. > Stupid question: how do you do that? I've looked through the docs but could only find information about changing the settings. > > > Bonus question: does ceph set that itself? > > yes and if you have a setup with HDD + SSD (db & wal) the discovery works > not in the right way. > Good to know! Gauvain > > Thanks! > > Gauvain > > > > >> Joachim >> >> ___________________________________ >> Clyso GmbH - Ceph Foundation Member >> >> Am 21.03.23 um 06:53 schrieb Gauvain Pocentek: >> > Hello all, >> > >> > We have an EC (4+2) pool for RGW data, with HDDs + SSDs for WAL/DB. This >> > pool has 9 servers with each 12 disks of 16TBs. About 10 days ago we >> lost a >> > server and we've removed its OSDs from the cluster. Ceph has started to >> > remap and backfill as expected, but the process has been getting slower >> and >> > slower. Today the recovery rate is around 12 MiB/s and 10 objects/s. All >> > the remaining unclean PGs are backfilling: >> > >> > data: >> > volumes: 1/1 healthy >> > pools: 14 pools, 14497 pgs >> > objects: 192.38M objects, 380 TiB >> > usage: 764 TiB used, 1.3 PiB / 2.1 PiB avail >> > pgs: 771559/1065561630 objects degraded (0.072%) >> > 1215899/1065561630 objects misplaced (0.114%) >> > 14428 active+clean >> > 50 active+undersized+degraded+remapped+backfilling >> > 18 active+remapped+backfilling >> > 1 active+clean+scrubbing+deep >> > >> > We've checked the health of the remaining servers, and everything looks >> > like (CPU/RAM/network/disks). >> > >> > Any hints on what could be happening? >> > >> > Thank you, >> > Gauvain >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@xxxxxxx >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx