Hello, new firefly cluster, currently just 1 storage node with 8 OSDs (3TB HDDs, journals on 4 DC3700 SSDs), the rest of the storage nodes are in the queue and 3 mons. Thus replication of 1. Now this is the 2nd incarnation if this "cluster", I did a first one a few days ago and this did NOT happen then. Neither was any software changed or updated and I definitely didn't see that with my emperor cluster when I increased PG_NUMs early in it's life. --- root at ceph-01:~# ceph osd pool set rbd pg_num 1024 Error E2BIG: specified pg_num 1024 is too large (creating 960 new PGs on ~8 OSDs exceeds per-OSD max of 32) --- And indeed when limiting it to 256 it worked (and so did further increases, albeit in steps of 256). While I see _why_ one would want to limit things like this that could lead to massive data movement, when and where was this limit introduced? Is it maybe triggered by data present, even if that isn't actual Ceph data like this: --- osdmap e86: 8 osds: 8 up, 8 in pgmap v444: 1152 pgs, 3 pools, 0 bytes data, 0 objects 384 MB used, 22344 GB / 22345 GB avail 1152 active+clean --- Also for the performance keeping record, I tested this cluster with rados bench (write) and a block size of 4K. At 256 PGs (and PGPs, before somebody asks) it was capable of 1500 IOPS. At 1024 PGS it was capable of 3500 IOPS, with clearly higher CPU usage, but very much within the capabilities of the machine. Food for thought. Christian -- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/