Am 01.10.21 um 16:52 schrieb Josh Baergen:
Hi Peter,
When I check for circles I found that running the upmap balancer alone never seems to create
any kind of circle in the graph
By a circle, do you mean something like this?
pg 1.a: 1->2 (upmap to put a chunk on 2 instead of 1)
pg 1.b: 2->3
pg 1.c: 3->1
Exactly. The upmap balancer tries to remove upmap entries first as far as I understand so I would expect that there never will be a circle like, 1->2, 2->1, but I don't
see why It would no accidently create a circle with more nodes involved.
If so, then it's not surprising that the upmap balancer wouldn't
create this situation by itself, since there's no reason for this set
of upmaps to exist purely for balance reasons. I don't think the
balancer needs any explicit code to avoid the situation because of
this.
Running pgremapper + balancer created circles with sometimes several dozen nodes. I would update the docs of the pgremapper
to warn about this fact and guide the users to use undo-upmap to slowly remove the upmaps create by cancel-backfill.
This is again not surprising, since cancel-backfill will do whatever's
necessary to undo a set of CRUSH changes (and some CRUSH changes
regularly lead to movement cycles like this), and then using the upmap
balancer will only make enough changes to achieve balance, not undo
everything that's there.
It might be a nice addition to pgremapper to add an option to optimze the upmap table.
What I'm still missing here is the value in this. Are there
demonstrable problems presented by a large upmap exception table (e.g.
performance or operational)?
I have no evidence, how expensive upmap table entries are. They need to be synched to every client and there have a certain overhead.
Maybe someone with more knowledge of the internals can give some insight here. The whole idea of crush is not carry a table with exact
mappings around and upmap entries are the exactly opposite of this idea. Bottom line, every circlic upmap definition is an overhead thats
unnecessary and i personally think it should be avoided.
Best,
Peter
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx