Max PGs per OSD creation limit?

chibi@xxxxxxx (Christian Balzer) · Mon, 14 Jul 2014 18:16:43 +0900

Hello,

new firefly cluster, currently just 1 storage node with 8 OSDs (3TB HDDs,
journals on 4 DC3700 SSDs), the rest of the storage nodes are in the queue
and 3 mons.  Thus replication of 1.

Now this is the 2nd incarnation if this "cluster", I did a first one a few
days ago and this did NOT happen then. 
Neither was any software changed or updated and I definitely didn't see
that with my emperor cluster when I increased PG_NUMs early in it's life.

---
root at ceph-01:~# ceph osd pool set rbd pg_num 1024
Error E2BIG: specified pg_num 1024 is too large (creating 960 new PGs on ~8 OSDs exceeds per-OSD max of 32)
---

And indeed when limiting it to 256 it worked (and so did further
increases, albeit in steps of 256).

While I see _why_ one would want to limit things like this that could lead
to massive data movement, when and where was this limit introduced?
Is it maybe triggered by data present, even if that isn't actual Ceph data
like this:
---
     osdmap e86: 8 osds: 8 up, 8 in
      pgmap v444: 1152 pgs, 3 pools, 0 bytes data, 0 objects
            384 MB used, 22344 GB / 22345 GB avail
                1152 active+clean
---

Also for the performance keeping record, I tested this cluster with rados
bench (write) and a block size of 4K.
At 256 PGs (and PGPs, before somebody asks) it was capable of 1500 IOPS.
At 1024 PGS it was capable of 3500 IOPS, with clearly higher CPU usage,
but very much within the capabilities of the machine.

Food for thought.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/