Re: pg autoscaler vs omap pools

Sage Weil <sweil@xxxxxxxxxx> · Wed, 20 Mar 2019 05:17:10 +0000 (UTC)

On Tue, 19 Mar 2019, Patrick Donnelly wrote:
> > > This would effectively be hacking an IOP bias in for omap-centric
> > > pools, on the data size-versus-heat tradeoff we've always had to deal
> > > with. A more generic solution would be for the balancer to explicitly
> > > account for both data size and IO activity. Right?
> >
> > Yeah, that's a better way to frame it!
> >
> > > Now, this is definitely not necessarily a better solution for the
> > > needs we have, but are we comfortable taking the quick hack instead?
> > > Especially since the omap data is often on a different set of drives,
> > > I'm not totally sure we need a size-based equalizer...
> >
> > So, instead of a configurable for an omap multiplier, perhaps instead we
> > have a per-pool property that is an IOPS bias (e.g., 10x in this case).  I
> > think this is a situation where we don't/can't automagically determine
> > that bias by measuring workload because workload and heat is ephemeral
> > while placement and rebalancing are hugely expensive.  We wouldn't want to
> > adjust placement automatically.
> >
> > How about pool property pg_autoscale_bias, and we have rgw and cephfs set
> > those automatically somewhere on the appropriate pools?
> 
> I'm wondering if the bias is really necessary if we can just set the
> pg_num_min at file system metadata pool / rgw index pool creation (or
> before turning on the autoscaler)? I would think that the difference
> between e.g. 32 PGs versus 64 PGs will not be significant for a
> metadata pool in terms of recovery or performance when we're looking
> at only a hundred or so gigabytes of omap data. The difference between
> 4 PGs and 32 PGs *is* significant though. So, maybe setting a
> reasonable min is enough?

Do you mean a min of 32 for *any* pool?  That would be a problem for pools 
like device_health_metrics.

If we set a PG min on omap pools like rgw index and cephfs metadata, then 
it's the same amount of "work" as setting a multiplier for those pools.  
But I think you might be right that a min of 32 may make more sense 
in those cases since we don't tend to have a zillion of them and we want 
good distribution out of the gate when they are empty.

I'm somewhat inclined to still have a multiplier, though, so that they 
also continue to scale up when they get big...

sage