Hi, Thanks so far for the suggestions. We have enabled the balancer first to make sure PG distribution is more optimal. After a few additions/replacements and data growth it was not optimal. We enabled upmap as this was suggested to be better than the default setting. To limit simultaneous data movement, we have set a max of 1% misplaced PG's and osd_max_backfills set to 1. osd_max_backfills at 1 is our default and we only increase it if recovery slows down too much after some time. Next is to increase the PG's of the biggest pool. We are quite low on PG's per OSD so I'd say for us we should be able to increase this a bit. Problem is it will move a lot of data around and I like to prevent too much impact on our cluster. Also we noticed we forgot (....) to increase pgp_num on this big pool so pg_num is 4k and pgp_num is 2k. Thus, first thing we need to do is make these the same. Increasing in one go means 50% of data being moved: 500TiB user or 1.5PiB raw storage. Can we increase this by, say 200 until we hit 4k to limit the amount of data being moved in a single go. Or is this not advisable? Met vriendelijke groet, Kind Regards, Maarten van Ingen Specialist |SURF |maarten.vaningen@xxxxxxx <mailto:voornaam.achternaam@xxxxxxx>| T +31 30 88 787 3000 |M +31 6 19 03 90 19| SURF <http://www.surf.nl/> is the collaborative organisation for ICT in Dutch education and research Op 07-02-2022 23:31 heeft Mark Nelson <mnelson@xxxxxxxxxx> geschreven: On 2/7/22 12:34 PM, Alexander E. Patrakov wrote: > пн, 7 февр. 2022 г. в 17:30, Robert Sander <r.sander@xxxxxxxxxxxxxxxxxxx>: >> And keep in mind that when PGs are increased that you also may need to >> increase the number of OSDs as one OSD should carry a max of around 200 >> PGs. But I do not know if that is still the case with current Ceph versions. > This is just the default limit. Even Nautilus can do 400 PGs per OSD, > given "mon max pg per osd = 400" in ceph.conf. Of course it doesn't > mean that you should allow this. There are multiple factors that play into how many PGs you can have per OSD. Some are tied to things like the pglog length (and associated memory usage), some are tied to the amount of pg statistics that gets sent to the mgr (the interval can be tweaked to lower this if you have many PGs), and some are tied to things like the pgmap size and mon limits. It's likely that for small clusters it may be possible to tweak things to support far more PGs per OSD (I've tested well over a 1000/osd on small clusters), while for extremely large clusters with many thousands of OSDs you may struggle to hit 100 PGs per OSD without tweaking settings. YMMV, which is why we have fairly conservative estimates for typical clusters. Mark _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx