Dan can confirm, but this is what I believe is main repo: https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py Bryan From: Anthony D'Atri <anthony.datri@xxxxxxxxx> Date: Friday, January 17, 2025 at 15:35 To: Stillwell, Bryan <bstillwe@xxxxxxxxxx> Cc: Alexander Patrakov <patrakov@xxxxxxxxx>, Kasper Rasmussen <kasper_steengaard@xxxxxxxxxxx>, ceph-users@xxxxxxx <ceph-users@xxxxxxx> Subject: Re: Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience That’s great to know, Bryan. I’ve seen multiple locations for the code out there, which one is canonical? (Lowercase c) On Jan 17, 2025, at 3: 46 PM, Stillwell, Bryan <bstillwe@ akamai. com> wrote: The latest version (since September) switched ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd That’s great to know, Bryan. I’ve seen multiple locations for the code out there, which one is canonical? (Lowercase c) On Jan 17, 2025, at 3:46 PM, Stillwell, Bryan <bstillwe@xxxxxxxxxx> wrote: The latest version (since September) switched to using the python rados bindings which not only fixes this problem, but also makes it much faster. It also has a fix I made that orders the upmaps so that data is moved off of OSDs before trying to move data on to them. This helps a lot on clusters with EC pools. Bryan From: Alexander Patrakov <patrakov@xxxxxxxxx<mailto:patrakov@xxxxxxxxx>> Date: Friday, January 17, 2025 at 09:53 To: Anthony D'Atri <anthony.datri@xxxxxxxxx<mailto:anthony.datri@xxxxxxxxx>> Cc: Kasper Rasmussen <kasper_steengaard@xxxxxxxxxxx<mailto:kasper_steengaard@xxxxxxxxxxx>>, ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> <ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>> Subject: Re: Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience !-------------------------------------------------------------------| This Message Is From an Untrusted Sender You have not previously corresponded with this sender. |-------------------------------------------------------------------! Hello Kasper, Please be aware that the current "upmap-remapped" script is flaky. It might just refuse to work, with this message: Error loading remapped pgs This has been traced to the fact that "ceph pg ls remapped -f json" sets its stderr to non-blocking mode, and that is the same file descriptor to which jq (which follows in the pipeline) writes. Thus, jq can get -EAGAIN and terminate prematurely. The problem is tracked as https://urldefense.com/v3/__https://tracker.ceph.com/issues/67505__;!!GjvTz_vk!UldZKAbJ2Z9kMh9IMdHxZdGbAmWC6sE3ekqhHQMHb-HchhMen_khX4bU3IQcH2foYQtx9R_4h3jtdOyn$<https://urldefense.com/v3/__https:/tracker.ceph.com/issues/67505__;!!GjvTz_vk!UldZKAbJ2Z9kMh9IMdHxZdGbAmWC6sE3ekqhHQMHb-HchhMen_khX4bU3IQcH2foYQtx9R_4h3jtdOyn$> Retrying the script might help. What's worse is that the whole reason for adding jq to the upmap-remapped script is another Ceph bug: it sometimes outputs invalid JSON (containing a literal inf or nan instead of a number), and this became much more common with Reef, as new fields were added that are commonly equal to inf or nan. This is tracked as https://urldefense.com/v3/__https://tracker.ceph.com/issues/66215__;!!GjvTz_vk!UldZKAbJ2Z9kMh9IMdHxZdGbAmWC6sE3ekqhHQMHb-HchhMen_khX4bU3IQcH2foYQtx9R_4h5M5tXer$<https://urldefense.com/v3/__https:/tracker.ceph.com/issues/66215__;!!GjvTz_vk!UldZKAbJ2Z9kMh9IMdHxZdGbAmWC6sE3ekqhHQMHb-HchhMen_khX4bU3IQcH2foYQtx9R_4h5M5tXer$> and has a fix merged in a not-yet-released version. Maybe you should look into alternative tools, like https://urldefense.com/v3/__https://github.com/digitalocean/pgremapper__;!!GjvTz_vk!UldZKAbJ2Z9kMh9IMdHxZdGbAmWC6sE3ekqhHQMHb-HchhMen_khX4bU3IQcH2foYQtx9R_4h8BgK2LL$<https://urldefense.com/v3/__https:/github.com/digitalocean/pgremapper__;!!GjvTz_vk!UldZKAbJ2Z9kMh9IMdHxZdGbAmWC6sE3ekqhHQMHb-HchhMen_khX4bU3IQcH2foYQtx9R_4h8BgK2LL$> On Fri, Jan 17, 2025 at 11:43 PM Anthony D'Atri <anthony.datri@xxxxxxxxx<mailto:anthony.datri@xxxxxxxxx>> wrote: > > > > > On Jan 17, 2025, at 6:02 AM, Kasper Rasmussen <kasper_steengaard@xxxxxxxxxxx<mailto:kasper_steengaard@xxxxxxxxxxx>> wrote: > > > > However I'm concerned with the amount of data that needs to be rebalanced, since the cluster holds multiple PB, and I'm looking for review of/input for my plan, as well as words of advice/experience from someone who has been in a similar situation. > > Yep, that’s why you want to use upmap-remapped. Otherwise the thundering herd of data shuffling will DoS your client traffic, esp. since you’re using spinners. Count on pretty much all data moving in the process, and the convergence taking …. maybe a week? > > > On Pacific: Data is marked as "degraded", and not misplaced as expected. I also see above 2000% degraded data (but that might be another issue) > > > > On Quincy: Data is marked as misplaced - which seems correct. > > > I’m not specifically familiar with such a change, but that could be mainly cosmetic, a function of how the percentage is calculated for objects / PGs that are multiply remapped. > > In the depths of time I had clusters that would sometimes show a negative number of RADOS objects to recover, it would bounce above and below zero a few times as it converged to 0. > > > > Instead balancing has been done by a cron job executing - ceph osd reweight-by-utilization 112 0.05 30 > > I used a similar strategy with older releases. Note that this will complicate your transition, as those relative weights are a function of the CRUSH topology, so when the topology changes, likely some reweighted OSDs will get much less than their fair share, and some will get much more. How full is your cluster (ceph df)? It might not be a bad idea to incrementally revert those all to 1.00000 if you have the capacity, and disable the cron job. > You’ll also likely want to switch to the balancer module for the upmap-remapped strategy to incrementally move your data around. Did you have it disabled for a specific reason? > > Updating to Reef before migrating might be to your advantage so that you can benefit from performance and efficiency improvements since Pacific. > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> -- Alexander Patrakov _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx