On Tue, 19 Mar 2019, Yiming Zhang wrote: > Hi All, > > We first need to find the saturation point for bluestone. My current method is to measure the kv_queue size and kv_cimmit_lat in the bluestore. The experiments in below shows the kv_queue size and commit_lat under different client_queue_depth. The block size is 4m and 4k. > > Some questions about how to measure the saturation points in bluestore: > 1. Is queue_transactions the interface for all transactions(metadata and > data)? If true, how to control the ratio between metadata trans and data > trans? If not, which function handles the data transactions? queue_transactions is the only entry point for writes. We don't really know up front what the ratio looks like. The cost function sort of infers it by counting ops vs the size of the write data payload, IIRC. > 2. Is it right way to measure the bluestore saturation queue size by > measuring the kv_queue size? I don't think it's related to kv_queue size, per se. IIUC the only way to tell what the saturation point is is to see if giving it more work gets more throughput (or not). > 3. Is the ratio between metadata and data transactions a fixed number or > dynamic number? Dynamic and workload dependent. > 4. We are trying to define the proper queue size in OSD layer and > bluestore layer (see graph in below). The right vertical line is the max > bluestore queue size where we only see increases in latency. Is there > any good suggestion on whatʼs the proper minimum bluestore queue size > (the left vertical line)? I don't think it makes sense to have a minimum queue size; that would mean we would sit and wait for more work before writing anything. In an idle cluster, that's clearly a bad idea, since a single IO wouldn't get processed immediately. Perhaps in a loaded cluster we could induce a wait if we are confident more work is coming (e.g., because we've seen consistent load for a while), but I'm worried that will backfire in some cases, and it's not clear to me if/how that would actually improve things. Maybe it would avoid the big/small pattern we see with commit batches, but even then I'm not sure the big/small pattern is bad per se... sage