On 06-08-15 10:16, Hector Martin wrote: > We have 48 OSDs (on 12 boxes, 4T per OSD) and 4 pools: > - 3 replicated pools (3x) > - 1 RS pool (5+2, size 7) > > The docs say: > http://ceph.com/docs/master/rados/operations/placement-groups/ > "Between 10 and 50 OSDs set pg_num to 4096" > > Which is what we did when creating those pools. This yields 16384 PGs > over 48 OSDs, which sounded reasonable at the time: 341 per OSD. > The mount of PGs is cluster wide and not per pool. So if you have 48 OSDs the rule of thumb is: 48 * 100 / 3 = 1600 PGs cluster wide. Now, with enough memory you can easily have 100 PGs per OSD, but keep in mind that the PG count is cluster-wide and not per pool. Wido > However, upon upgrade to Hammer, it started complaining: > health HEALTH_WARN > too many PGs per OSD (1365 > max 300) > > It seems the actual math multiplies everything by the size of the pools > (which in retrospect makes sense): (3*4096*3 + 1*4096*7) / 48 = 1365 > > And Hammer by default sets: > mon_pg_warn_max_per_osd = 300 > > For now I'm just going to bump up the setting to make the warning go > away, but I'm concerned about the implications of this. Two of the 3x > pools are not production and I can nuke and re-create them (with 512 PGs > instead? Does that sound reasonable?), but the RS pool and the other rep > pool are and there's no simple way for us to re-create them at this > point (though that might be a good excuse to develop something that > would enable that - which might be doable-ish for the RS pool at least, > which is the biggest offender). > > Questions: > - Does this mean that the docs are wrong and need fixing? It seems that > blindly following the docs can easily yield per-OSD PG counts that are > off by a factor of 5 from the max, without doing anything too weird > (just 4 reasonably simple pools) > - Should I be concerned about the performance impact? How was the value > 300 arrived at? > - We're going to be using this cluster for more things (services), which > means creating more pools. Should I plan ahead for, say, a time when we > have 12 pools on it, and divide everything by 12? The cluster is > currently very overprovisioned for space, so we're probably not going to > be adding OSDs for quite a while, but we'll be adding pools. > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com