On Thu, Aug 28, 2014 at 10:47 PM, Christian Balzer <chibi at gol.com> wrote: >> There are 1328 PG's in the pool, so about 110 per OSD. >> > And just to be pedantic, the PGP_NUM is the same? Ah, "ceph status" reports 1328 pgs. But: $ sudo ceph osd pool get rbd pg_num pg_num: 1200 $ sudo ceph osd pool get rbd pgp_num pgp_num: 1200 Now, 1200 is not a power of two, but it makes sense. (12 x 100). Probably we forewent the power of two because it was such a huge increase and we were already erring large. Apparently the 1328 figure includes 128 pg's for the (unused in our case) data and metadata pools. > Since you can't go down, the only way is up. To 2048 > See it as an early preparation step towards the time when you reach 48 > OSDs. ^o^ Demand for this cluster exceeds all estimates and plans, so that may be (much) sooner than expected! To start with, I bumped the 1200 pg's to 1280, figuring that at least it was power-of-twoier (tm) than 1200, and that I could then add 256 at a time. However, the increase to 1280 caused several OSD's to spike up over 85% and wedged a bunch of pg's in active+remapped+backfill_toofull. To fix it, I had to change "osd backfill full ratio = 0.90" in the ceph.conf and manually restart all the OSD's. That was pretty unsettling on a production cluster, so I'm definitely hesitant to raise it any more if there's any chance increasing it could push individual OSD's over 90%. It's just so frustrating to have one OSD at 74% and another at 88% and be taking "near full" warnings as a result. The data could just move over a little and everything would be fine. Feels like Happy Gilmore. "Why don't you just go home? That's your home!! Are you too good for your home?!?" Thanks!