I would love to see those types of speeds. I tried setting it all the way to 0 and nothing, I did that before I sent the first email, maybe it was your old post I got it from. osd_recovery_sleep_hdd 0.000000 override (mon[0.000000]) On Mon, Jun 27, 2022 at 9:27 PM Robert Gallop <robert.gallop@xxxxxxxxx> wrote: > I saw a major boost after having the sleep_hdd set to 0. Only after that > did I start staying at around 500MiB to 1.2GiB/sec and 1.5k obj/sec to 2.5k > obj/sec. > > Eventually it tapered back down, but for me sleep was the key, and > specifically in my case: > > osd_recovery_sleep_hdd > > On Mon, Jun 27, 2022 at 11:17 AM Curt <lightspd@xxxxxxxxx> wrote: > >> On Mon, Jun 27, 2022 at 8:52 PM Frank Schilder <frans@xxxxxx> wrote: >> >> > I think this is just how ceph is. Maybe you should post the output of >> > "ceph status", "ceph osd pool stats" and "ceph df" so that we can get an >> > idea whether what you look at is expected or not. As I wrote before, >> object >> > recovery is throttled and the recovery bandwidth depends heavily on >> object >> > size. The interesting question is, how many objects per second are >> > recovered/rebalanced >> > >> data: >> pools: 11 pools, 369 pgs >> objects: 2.45M objects, 9.2 TiB >> usage: 20 TiB used, 60 TiB / 80 TiB avail >> pgs: 512136/9729081 objects misplaced (5.264%) >> 343 active+clean >> 22 active+remapped+backfilling >> >> io: >> client: 2.0 MiB/s rd, 344 KiB/s wr, 142 op/s rd, 69 op/s wr >> recovery: 34 MiB/s, 8 objects/s >> >> Pool 12 is the only one with any stats. >> >> pool EC-22-Pool id 12 >> 510048/9545052 objects misplaced (5.344%) >> recovery io 36 MiB/s, 9 objects/s >> client io 1.8 MiB/s rd, 404 KiB/s wr, 86 op/s rd, 72 op/s wr >> >> --- RAW STORAGE --- >> CLASS SIZE AVAIL USED RAW USED %RAW USED >> hdd 80 TiB 60 TiB 20 TiB 20 TiB 25.45 >> TOTAL 80 TiB 60 TiB 20 TiB 20 TiB 25.45 >> >> --- POOLS --- >> POOL ID PGS STORED OBJECTS USED %USED MAX >> AVAIL >> .mgr 1 1 152 MiB 38 457 MiB 0 >> 9.2 TiB >> 21BadPool 3 32 8 KiB 1 12 KiB 0 >> 18 TiB >> .rgw.root 4 32 1.3 KiB 4 48 KiB 0 >> 9.2 TiB >> default.rgw.log 5 32 3.6 KiB 209 408 KiB 0 >> 9.2 TiB >> default.rgw.control 6 32 0 B 8 0 B 0 >> 9.2 TiB >> default.rgw.meta 7 8 6.7 KiB 20 203 KiB 0 >> 9.2 TiB >> rbd_rep_pool 8 32 2.0 MiB 5 5.9 MiB 0 >> 9.2 TiB >> default.rgw.buckets.index 9 8 2.0 MiB 33 5.9 MiB 0 >> 9.2 TiB >> default.rgw.buckets.non-ec 10 32 1.4 KiB 0 4.3 KiB 0 >> 9.2 TiB >> default.rgw.buckets.data 11 32 232 GiB 61.02k 697 GiB 2.41 >> 9.2 TiB >> EC-22-Pool 12 128 9.8 TiB 2.39M 20 TiB 41.55 >> 14 TiB >> >> >> >> > Maybe provide the output of the first two commands for >> > osd_recovery_sleep_hdd=0.05 and osd_recovery_sleep_hdd=0.1 each (wait a >> bit >> > after setting these and then collect the output). Include the applied >> > values for osd_max_backfills* and osd_recovery_max_active* for one of >> the >> > OSDs in the pool (ceph config show osd.ID | grep -e osd_max_backfills -e >> > osd_recovery_max_active). >> > >> >> I didn't notice any speed difference with sleep values changed, but I'll >> grab the stats between changes when I have a chance. >> >> ceph config show osd.19 | egrep >> 'osd_max_backfills|osd_recovery_max_active' >> osd_max_backfills 1000 >> >> >> override mon[5] >> osd_recovery_max_active 1000 >> >> >> override >> osd_recovery_max_active_hdd 1000 >> >> >> override mon[5] >> osd_recovery_max_active_ssd 1000 >> >> >> override >> >> > >> > I don't really know if on such a small cluster one can expect more than >> > what you see. It has nothing to do with network speed if you have a 10G >> > line. However, recovery is something completely different from a full >> > link-speed copy. >> > >> > I can tell you that boatloads of tiny objects are a huge pain for >> > recovery, even on SSD. Ceph doesn't raid up sections of disks against >> each >> > other, but object for object. This might be a feature request: that PG >> > space allocation and recovery should follow the model of LVM extends >> > (ideally match with LVM extends) to allow recovery/rebalancing larger >> > chunks of storage in one go, containing parts of a large or many small >> > objects. >> > >> > Best regards, >> > ================= >> > Frank Schilder >> > AIT Risø Campus >> > Bygning 109, rum S14 >> > >> > ________________________________________ >> > From: Curt <lightspd@xxxxxxxxx> >> > Sent: 27 June 2022 17:35:19 >> > To: Frank Schilder >> > Cc: ceph-users@xxxxxxx >> > Subject: Re: Re: Ceph recovery network speed >> > >> > Hello, >> > >> > I had already increased/changed those variables previously. I increased >> > the pg_num to 128. Which increased the number of PG's backfilling, but >> > speed is still only at 30 MiB/s avg and has been backfilling 23 pg for >> the >> > last several hours. Should I increase it higher than 128? >> > >> > I'm still trying to figure out if this is just how ceph is or if there >> is >> > a bottleneck somewhere. Like if I sftp a 10G file between servers it's >> > done in a couple min or less. Am I thinking of this wrong? >> > >> > Thanks, >> > Curt >> > >> > On Mon, Jun 27, 2022 at 12:33 PM Frank Schilder <frans@xxxxxx<mailto: >> > frans@xxxxxx>> wrote: >> > Hi Curt, >> > >> > as far as I understood, a 2+2 EC pool is recovering, which makes 1 OSD >> per >> > host busy. My experience is, that the algorithm for selecting PGs to >> > backfill/recover is not very smart. It could simply be that it doesn't >> find >> > more PGs without violating some of these settings: >> > >> > osd_max_backfills >> > osd_recovery_max_active >> > >> > I have never observed the second parameter to change anything (try any >> > ways). However, the first one has a large impact. You could try >> increasing >> > this slowly until recovery moves faster. Another parameter you might >> want >> > to try is >> > >> > osd_recovery_sleep_[hdd|ssd] >> > >> > Be careful as this will impact client IO. I could reduce the sleep for >> my >> > HDDs to 0.05. With your workload pattern, this might be something you >> can >> > tune as well. >> > >> > Having said that, I think you should increase your PG count on the EC >> pool >> > as soon as the cluster is healthy. You have only about 20 PGs per OSD >> and >> > large PGs will take unnecessarily long to recover. A higher PG count >> will >> > also make it easier for the scheduler to find PGs for recovery/backfill. >> > Aim for a number between 100 and 200. Give the pool(s) with most data >> > (#objects) the most PGs. >> > >> > Best regards, >> > ================= >> > Frank Schilder >> > AIT Risø Campus >> > Bygning 109, rum S14 >> > >> > ________________________________________ >> > From: Curt <lightspd@xxxxxxxxx<mailto:lightspd@xxxxxxxxx>> >> > Sent: 24 June 2022 19:04 >> > To: Anthony D'Atri; ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> >> > Subject: Re: Ceph recovery network speed >> > >> > 2 PG's shouldn't take hours to backfill in my opinion. Just 2TB >> enterprise >> > HD's. >> > >> > Take this log entry below, 72 minutes and still backfilling undersized? >> > Should it be that slow? >> > >> > pg 12.15 is stuck undersized for 72m, current state >> > active+undersized+degraded+remapped+backfilling, last acting >> > [34,10,29,NONE] >> > >> > Thanks, >> > Curt >> > >> > >> > On Fri, Jun 24, 2022 at 8:53 PM Anthony D'Atri <anthony.datri@xxxxxxxxx >> > <mailto:anthony.datri@xxxxxxxxx>> >> > wrote: >> > >> > > Your recovery is slow *because* there are only 2 PGs backfilling. >> > > >> > > What kind of OSD media are you using? >> > > >> > > > On Jun 24, 2022, at 09:46, Curt <lightspd@xxxxxxxxx<mailto: >> > lightspd@xxxxxxxxx>> wrote: >> > > > >> > > > Hello, >> > > > >> > > > I'm trying to understand why my recovery is so slow with only 2 pg >> > > > backfilling. I'm only getting speeds of 3-4/MiB/s on a 10G >> network. I >> > > > have tested the speed between machines with a few tools and all >> confirm >> > > 10G >> > > > speed. I've tried changing various settings of priority and >> recovery >> > > sleep >> > > > hdd, but still the same. Is this a configuration issue or something >> > else? >> > > > >> > > > It's just a small cluster right now with 4 hosts, 11 osd's per. >> Please >> > > let >> > > > me know if you need more information. >> > > > >> > > > Thanks, >> > > > Curt >> > > > _______________________________________________ >> > > > ceph-users mailing list -- ceph-users@xxxxxxx<mailto: >> > ceph-users@xxxxxxx> >> > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto: >> > ceph-users-leave@xxxxxxx> >> > > >> > > >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx >> > >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto: >> > ceph-users-leave@xxxxxxx> >> > >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx