Re: BlueStore Throttle Experiment

Sage Weil <sweil@xxxxxxxxxx> · Wed, 20 Mar 2019 04:13:33 +0000 (UTC)

On Tue, 19 Mar 2019, Yiming Zhang wrote:
> Hi All,
> 
> We first need to find the saturation point for bluestone. My current method is to measure the kv_queue size and kv_cimmit_lat in the bluestore. The experiments in below shows the kv_queue size and commit_lat under different client_queue_depth. The block size is 4m and 4k.  
> 
> Some questions about how to measure the saturation points in bluestore:
> 1. Is queue_transactions the interface for all transactions(metadata and 
> data)? If true, how to control the ratio between metadata trans and data 
> trans? If not, which function handles the data transactions?

queue_transactions is the only entry point for writes.  We don't really 
know up front what the ratio looks like.  The cost function sort of infers 
it by counting ops vs the size of the write data payload, IIRC.

> 2. Is it right way to measure the bluestore saturation queue size by 
> measuring the kv_queue size?

I don't think it's related to kv_queue size, per se.  IIUC the only way to 
tell what the saturation point is is to see if giving it more work gets 
more throughput (or not).

> 3. Is the ratio between metadata and data transactions a fixed number or 
> dynamic number?

Dynamic and workload dependent.

> 4. We are trying to define the proper queue size in OSD layer and 
> bluestore layer (see graph in below). The right vertical line is the max 
> bluestore queue size where we only see increases in latency. Is there 
> any good suggestion on whatʼs the proper minimum bluestore queue size 
> (the left vertical line)?

I don't think it makes sense to have a minimum queue size; that would mean 
we would sit and wait for more work before writing anything.  In an idle 
cluster, that's clearly a bad idea, since a single IO wouldn't get 
processed immediately.  Perhaps in a loaded cluster we could induce a wait 
if we are confident more work is coming (e.g., because we've 
seen consistent load for a while), but I'm worried that will backfire in 
some cases, and it's not clear to me if/how that would actually improve 
things.  Maybe it would avoid the big/small pattern we see with commit 
batches, but even then I'm not sure the big/small pattern is bad per se...

sage