Re: pg autoscaler vs omap pools

Sage Weil <sweil@xxxxxxxxxx> · Fri, 22 Mar 2019 05:18:12 +0000 (UTC)

On Wed, 20 Mar 2019, Patrick Donnelly wrote:
> > Do you mean a min of 32 for *any* pool?  That would be a problem for pools
> > like device_health_metrics.
> 
> No, just omap-heavy pools.
> 
> > If we set a PG min on omap pools like rgw index and cephfs metadata, then
> > it's the same amount of "work" as setting a multiplier for those pools.
> > But I think you might be right that a min of 32 may make more sense
> > in those cases since we don't tend to have a zillion of them and we want
> > good distribution out of the gate when they are empty.
> >
> > I'm somewhat inclined to still have a multiplier, though, so that they
> > also continue to scale up when they get big...
> 
> The only question is whether it's really necessary for the multiplier
> to cause a small omap-heavy pool to go from e.g. 32 to 64 pgs after
> reaching the next byte threshold. Would going from 32 to 64 PGs have
> some real benefit? If so, then the multiplier should be chosen to
> maximize that benefit against cost of adding more PGs to a small pool.

The autoscaler works by calculating the "optimal" number of PGs and 
"rounding" to a power of 2.  If that is more than 3x off from the actual 
pool size, then it'll make an adjustment.  So in general, we tend to stick 
with teh current pg_num unless it is really bad (it's sticky), if a pool 
grows slowly and autoscales up, the pg_num will go up by 4x each time 
we make a change.  In your example, it'd go from 32 to 128.

sage