Re: What is the problem with many PGs per OSD

Frank Schilder <frans@xxxxxx> · Wed, 9 Oct 2024 09:34:02 +0000

Hi Janne,

thanks for looking at this. I'm afraid I have to flag this as rumor as well, you are basically stating it yourself:

> I could imagine that each PG had a certain amount of meta-operations, ...

So, yes, maybe. But for sure? Why would this not be proportional to IO operations? To object count? My own imagination is not good enough to see that PGs just do stuff for fun. A lot of code in ceph triggers book keeping and cleanup together with client ops, so its a constant overhead per client OP. Also operations like data base vacuuming are usually at least N log N with N the size of the DB. Reducing the PG size should also reduce the PG data base size, which would result in M*(N/M log N/M) after splitting by a factor of M, and we have an actual improvement since

  M*(N/M log N/M) = N log N/M < N log N

It is a good idea to collect such hypotheses, assuming that a dev drops by and can comment on that with background from the implementation. I just won't be satisfied with speculation this time around and will keep bugging.

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Janne Johansson <icepic.dz@xxxxxxxxx>
Sent: Wednesday, October 9, 2024 11:20 AM
To: Frank Schilder
Cc: Anthony D'Atri; ceph-users@xxxxxxx
Subject: Re:  Re: What is the problem with many PGs per OSD

> Thanks for chiming in. Unfortunately, it doesn't really help answering my questions either.
>
> Concurrency: A system like ceph that hashes data into PGs translates any IO into random IO anyways. So it's irrelevant for spinners, they have to seek anyways and the degree of parallelism doesn't matter on systems with sufficient load. In addition, for OSDs at least up to pacific the kv_sync_thread serializes everything (writes only?) anyways, so whatever concurrency more PGs add, this thread puts it back in sequence.

I could imagine that each PG had a certain amount of meta-operations,
like logging, database vacuuming or reindexing and so on that happens
at some intervals regardless of if you access objects or not.
In that case, the PG meta ops would scale with the number of PGs in
the OSD but as you state above not with the number of objects, which
of course stays more or less the same. If this was true, then going
from 100 to 1000 PGs would make these ops upto 10x more while object
IO would stay the same.

--
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx