Hi Janne, thanks for looking at this. I'm afraid I have to flag this as rumor as well, you are basically stating it yourself: > I could imagine that each PG had a certain amount of meta-operations, ... So, yes, maybe. But for sure? Why would this not be proportional to IO operations? To object count? My own imagination is not good enough to see that PGs just do stuff for fun. A lot of code in ceph triggers book keeping and cleanup together with client ops, so its a constant overhead per client OP. Also operations like data base vacuuming are usually at least N log N with N the size of the DB. Reducing the PG size should also reduce the PG data base size, which would result in M*(N/M log N/M) after splitting by a factor of M, and we have an actual improvement since M*(N/M log N/M) = N log N/M < N log N It is a good idea to collect such hypotheses, assuming that a dev drops by and can comment on that with background from the implementation. I just won't be satisfied with speculation this time around and will keep bugging. Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Janne Johansson <icepic.dz@xxxxxxxxx> Sent: Wednesday, October 9, 2024 11:20 AM To: Frank Schilder Cc: Anthony D'Atri; ceph-users@xxxxxxx Subject: Re: Re: What is the problem with many PGs per OSD > Thanks for chiming in. Unfortunately, it doesn't really help answering my questions either. > > Concurrency: A system like ceph that hashes data into PGs translates any IO into random IO anyways. So it's irrelevant for spinners, they have to seek anyways and the degree of parallelism doesn't matter on systems with sufficient load. In addition, for OSDs at least up to pacific the kv_sync_thread serializes everything (writes only?) anyways, so whatever concurrency more PGs add, this thread puts it back in sequence. I could imagine that each PG had a certain amount of meta-operations, like logging, database vacuuming or reindexing and so on that happens at some intervals regardless of if you access objects or not. In that case, the PG meta ops would scale with the number of PGs in the OSD but as you state above not with the number of objects, which of course stays more or less the same. If this was true, then going from 100 to 1000 PGs would make these ops upto 10x more while object IO would stay the same. -- May the most significant bit of your life be positive. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx