Re: Bluestore Latency Components

Mark Nelson <mnelson@xxxxxxxxxx> · Tue, 2 Jun 2020 05:57:04 -0500

On 6/1/20 11:58 PM, Yiming Zhang wrote:

On Jun 1, 2020, at 5:10 PM, Mark Nelson <mnelson@xxxxxxxxxx 
<mailto:mnelson@xxxxxxxxxx>> wrote:
Hi Yiming,

Are you changing the overall data set size when you change the image 
size?  IE in your 40GB image test, is your data set 40x larger than 
in your 1GB image test? 
I’m using the same workload:
rw=randwrite
bs=4096
time_based=1
runtime=300
direct = 1
iodepth=48
Both run has the same run time 300s.

Ok, but are you doing the randwrite workload across the entire image in 
both cases?  If so, that will be many more objects you are spanning 
writes across for the 40GB image vs the 1GB image.

That would have various effects, including changing the number of 
onodes in the cache and the potential for cache misses hitting 
rocksdb and eventually the disk.  Having said that, with the default 
4GB memory target I wouldn't expect you to have cache misses with 
typical RBD workloads even with a 40GB dataset on a single OSD unless 
you've tweaked the object size to be smaller or caused additional 
metadata per object in some way (EC, etc).

Theoretically you might be able to use lttng or jaeger tracepoints to 
track latency, or possible look at the perf counters.  Otherwise you 
might also be able to see something through wallclock profiling.
I tried the gdb wallclock profiling. The I can only see the fio and 
osd related time, not include the bluestore resources. Details please 
see here <https://pastebin.com/6UiLRGvY>.
I added bunch of perf counters in BlueStore to track the latencies. I 
don’t see any suspicious counters. For locking behavior, is there any 
possible reasons for that? Really appreciated if you could point me 
which lock you mean in kv_sync_thread.

It looks to me like you ran it against the client fio process rather 
than the OSD?

Thanks,
Yiming
 I would probably look carefully at things happening in the kv sync 
thread since this is a random write workload and that's where I'd 
expect to see blocking behavior that could cause latency spikes like 
this.

Mark

On 6/1/20 1:50 PM, Yiming Zhang wrote:
Hi All,

I have noticed that different RBD image size can shape the bluestore 
latency differently. Is there baseline or guidance for choosing the 
image size?
Left: RBD image size is 1GB
middle: RBD image size is 40GB
Right: RBD image size is 1GB, RocksDB write buffer 10X default

4K randwrite on SSD with FIO. SSD is preconditioned and image is 
prefilled(20mins).
Red dot is L1 compaction and green dot is L0 compaction.

Let’s focus on the left graph. The smaller spikes are caused by 
compactions. The higher spikes seems to be caused by the BlueStore 
itself.
I suspect this could be related to RBD image size in someway.

Does anyone know what could the cause of the higher spikes? And how 
to debug it?
Also, what is the proper RBD image size for my test?

Please advice.

Thanks,
Yiming

_______________________________________________
Dev mailing list -- dev@xxxxxxx <mailto:dev@xxxxxxx>
To unsubscribe send an email to dev-leave@xxxxxxx 
<mailto:dev-leave@xxxxxxx>
_______________________________________________
Dev mailing list -- dev@xxxxxxx <mailto:dev@xxxxxxx>
To unsubscribe send an email to dev-leave@xxxxxxx 
<mailto:dev-leave@xxxxxxx>
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx