Hi, the presentation from Dan van der Ster should answer all your questions: https://youtu.be/9lsByOMdEwc?si=GfvICgZCnT2L93Tn If you have no time, start at 17:00 min. Joachim joachim.kraftmayer@xxxxxxxxx www.clyso.com Janne Johansson <icepic.dz@xxxxxxxxx> schrieb am Sa., 7. Sept. 2024, 09:00: > The pgremapper (and the python one) will allow you to mark all the PGs > that a new disk gets as an empty-misplaced PG to be correct where they > currently are. This means that after you run one of the remappers, the > upmap will tell the cluster to stay as it is even though new empty > OSDs have arrived with correct crush weights and all. > > So you set norebalance, add N+1 new OSDs, the cluster shudders for a > short while when the new-empty PGs are created on the new drives, then > you have lots and lots of misplaced PGs which norebalance prevents > from starting backfills on. > > Then you run the remapper and "fix" the upmap so all the current > placements are considered "correct", and hence the PGs that were > supposed to move stop being misplaced. After this, you remove > "norebalance" to allow moves to start happening. > > By now, no movement (or at least very few PGs) should occur. What > happens next is that the balancer notices there actually is more > space, and figures out the optimal result is more or less the same as > above where lots of PGs should go to the new OSDs, but it does this > with the max-misplaced-ratio in mind, and it will "move" the PGs by > just unsetting their upmap entry that forced them to want to stay in > place, so as time passes, it moves a few PGs at a time, and moves them > by removing the upmaps from them so that most of the time your other > OSDs will look perfectly healthy and will continue to do all the > scrubs and things a healthy OSD should do and which it will not do if > it has a long queue of backfills waiting to eat up all the slots for > non-client IO. > > While you can do soft additions with increasing crush weights, this > potentially causes lots of more movements since an OSD host with OSDs > going from 0.1 to 0.2 weight might not place all PGs in the same spot > in those two cases, so you could have movement within the host and so > on from the recalculated pseudorandom placements on every increase. > > Den lör 7 sep. 2024 kl 00:15 skrev Eugen Block <eblock@xxxxxx>: > > > > I can’t say anything about the pgremapper, but have you tried > > increasing the crush weight gradually? Add new OSDs with crush initial > > weight 0 and then increase it in small steps. I haven’t used that > > approach for years, but maybe that can help here. Or are all OSDs > > already up and in? Or you could reduce the max misplaced ratio to 1% > > or even lower (default is 5%)? > > > > Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>: > > > > > Forgot to paste, somehow I want to reduce this recovery operation: > > > recovery: 0 B/s, 941.90k keys/s, 188 objects/s > > > To 2-300Keys/sec > > > > > > > > > > > > ________________________________ > > > From: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx> > > > Sent: Friday, September 6, 2024 11:18 PM > > > To: Ceph Users <ceph-users@xxxxxxx> > > > Subject: Somehow throotle recovery even further than > > > basic options? > > > > > > Hi, > > > > > > 4 years ago we've created our cluster with all disks 4osds (ssds and > > > nvme disks) on octopus. > > > The 15TB SSDs still working properly with 4 osds but the small 1.8T > > > nvmes with the index pool not. > > > Each new nvme osd adding to the existing nodes generates slow ops > > > with scrub off, recovery_op_priority 1, backfill and recovery 1-1. > > > I even turned off all index pool heavy sync mechanism but the read > > > latency still high which means recovery op pushes it even higher. > > > > > > I'm trying to somehow add resource to the cluster to spread the 2048 > > > index pool pg (in replica 3 means 6144pg index pool) but can't make > > > it more gentle. > > > > > > The balancer is working in upmap with max deviation 1. > > > > > > Have this script from digitalocean > > > https://github.com/digitalocean/pgremapper, is there anybody tried > > > it before how is it or could this help actually? > > > > > > Thank you the ideas. > > > > > > ________________________________ > > > This message is confidential and is for the sole use of the intended > > > recipient(s). It may also be privileged or otherwise protected by > > > copyright or other legal rules. If you have received it by mistake > > > please let us know by reply email and delete it from your system. It > > > is prohibited to copy this message or disclose its content to > > > anyone. Any confidentiality or privilege is not waived or lost by > > > any mistaken delivery or unauthorized disclosure of the message. All > > > messages sent to and from Agoda may be monitored to ensure > > > compliance with company policies, to protect the company's interests > > > and to remove potential malware. Electronic messages may be > > > intercepted, amended, lost or deleted, or contain viruses. > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > -- > May the most significant bit of your life be positive. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx