Re: CRUSH rebalance all at once or host-by-host?

Stefan Kooman <stefan@xxxxxx> · Thu, 9 Jan 2020 09:40:52 +0100

Quoting Sean Matheny (s.matheny@xxxxxxxxxxxxxx):
> I tested this out by setting norebalance and norecover, moving the host buckets under the rack buckets (all of them), and then unsetting. Ceph starts melting down with escalating slow requests, even with backfill and recovery parameters set to throttle. I moved the host buckets back to the default root bucket, and things mostly came right, but I still had some inactive / unknown pgs that I had to restart some OSDs to get back to health_ok.
> 
> I’m sure there’s a way you can tune things or fade in crush weights or something, but I’m happy just moving one at a time.

For big changes like this you can use Dan's UPMAP trick:
https://www.slideshare.net/Inktank_Ceph/ceph-day-berlin-mastering-ceph-operations-upmap-and-the-mgr-balancer

Python script:
https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py

This way you can pause the process or get in "HEALTH_OK" state when
you want to.

Gr. Stefan

-- 
| BIT BV  https://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com