RE: BlueStore Performance issue

Sage Weil <sweil@xxxxxxxxxx> · Wed, 9 Mar 2016 10:38:19 -0500 (EST)

On Wed, 9 Mar 2016, Allen Samuels wrote:
> > Stage 3 is serious bottleneck in that it guarantees that you will
> > never exceed QD=1 for your logging device. We believe there is no need
> > to serialize the KV commit operations.
> 
> It's potentially a bottleneck, yes, but it's also what keeps the commit 
> rate self-throttling.  If we assume that there are generally lots of 
> other IOs in flight because every op isn't metadata-only the QD will be 
> higher.
>
> If it's a separate log device, though, yes.. it will have QD=1.  In 
> those situations, though, the log device is probably faster than the 
> other devices, and a shallow QD probably isn't going to limit 
> throughput--just marginally increase latency?
> 
> [Allen] No a shallow queue depth will directly impact BW on many 
> (most?/all?) SSDs. I agree that in a hybrid model (DB on flash, data on 
> HDD) that the delivered performance delta may not be large. As for the 
> throttling, we haven't focused on that area yet (just enough to put it 
> on the list of future things to investigate).

FWIW in the single-device non-hybrid case, the QD=1 for *kv* IOs, but 
there will generally be a whole bunch of non-kv reads and writes also in 
flight.  I wouldn't expect us to ever actually have a QD of 1 unless 
*every* operation is pure-kv (say, omap operations).

For example, say we have 4 KB random writes, and the QD at the OSD level 
is 64.  In that case, BlueStore should have up to 64 4 KB aio writes and 0 
to 1 64*whatever kv writes in flight.

My intuition says that funneling the txn commits like this will in 
practice maybe halve the effective QD at the device (compared to the QD at 
the OSD)... does that seem about right?  Maybe it'd stay about the same, 
since 1 OSD IO is actually between 1 and 2 device IOs (the aio write + the 
[maybe batched] txn commit).

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html