Re: Proxy write performance on fio with zipf distribution

Mark Nelson <mnelson@xxxxxxxxxx> · Mon, 18 May 2015 10:21:07 -0500

On 05/18/2015 02:29 AM, Wang, Zhiqiang wrote:
Hi all,

This is a follow-up of a previous discussion in the performance weekly meeting on proxy write performance. Several months ago, I tested the performance of proxy write using fio with uniform random distribution. The performance gain is about 3.5x compared with non-proxywrite case. However, the uniform random workload is not an ideal workload for cache tiering. I learned from Mark/Sage that fio can generate non-uniform random (zipf/pareto) workload using some config options. I did the evaluations. Please allow me to report the results here.

Configurations:
- 2 ceph node, each with 1xIntel Xeon E3-1275 V2 @3.5GHz CPU, 32GB memory, 10Gb NIC. There are 8 HDDs and 6 intel DCS3700 SSDs on each node
- 1 ceph client, with 2xIntel Xeon x5570 @2.93GHz CPU, 128GB memory and 10GB NIC, running 20 VMs, each VM runs fio on a RBD
- Base pool: composed of 16 HDD OSDs, with 4 SSDs acting as the journal
- Cache pool: composed of 8 SSD OSDs, journals are on the same SSD
- Data set size: 20x20GB
- Cache tier configurations: target_max_bytes 100GB, cache_target_dirty_ratio 0.4, cache_target_full_ratio 0.8, write recency 1
- Code version: proxy write code is at https://github.com/ceph/ceph/pull/3354. The without proxy write code version is on the same branch, but eliminating the proxy write commits.
- Fio configuration: 4k random write, random_distribution zipf:1.1, ioengine libaio
The top 10% of the data is hit over 80% of the time as generated by fio-genzipf. I don't quite understand what the '-b' option means, and fio-genzipf core dumps if I use 4096 for it. So I used 1000000, which seems to be the default.
# ./fio-genzipf -t zipf -i 1.1 -b 1000000 -g 400 -o 10
Generating Zipf distribution with 1.100000 input and 400 GB size and 1000000 block_size.

    Rows           Hits %         Sum %           # Hits          Size
----------------------------------------------------------------------------------------------------------------------------
Top  10.00%      82.86%          82.86%           355883        331.44G
|->  20.00%       4.13%          86.99%            17725         16.51G
|->  30.00%       2.86%          89.85%            12278         11.43G
|->  40.00%       1.58%          91.43%             6781          6.32G
|->  50.00%       1.43%          92.85%             6139          5.72G
|->  60.00%       1.43%          94.28%             6139          5.72G
|->  70.00%       1.43%          95.71%             6139          5.72G
|->  80.00%       1.43%          97.14%             6139          5.72G
|->  90.00%       1.43%          98.57%             6139          5.72G
|-> 100.00%       1.43%         100.00%             6134          5.71G
---------------------------------------------------------------------------------------------------------------------------

Performance results:

- Without proxy write results
QD			1		2		4		8		16
IOPS			190		200		203		201		198
Latency (ms)	100.43	191.9	380.57	775.15	1600.66

- With proxy write results
QD			1		2		4		8		16
IOPS			902		896		1067	1207	1486
Latency (ms)	22.08	44.43	74.83	133.12	217.42

As you can see from above, proxy write improves IOPS from ~200 up to ~1400, and reduces latency by about 80%.

Any comments/feedbacks are welcomed.

Excellent testing Zhiqiang!  It looks like proxy write helps 
dramatically.  Do you happen to know what the performance of the base 
pool is without the cache tier involved?  That was the big problem we 
saw back in firefly.  We didn't have proxy write, but here's the results 
we saw for 4k random writes and 4k random zipf 1.2 writes in 4 different 
pool configurations (base, base+ssd journal, base+ssd tiering, ssd as base)

http://nhm.ceph.com/librbdfio-tiering-tests2/randwrite%204K-iops.png

that graph is a bit difficult to make out, but the gist of it is that in 
firefly, the performance of a spinning disk base pool with SSD cache 
pool was significantly slower than either the base pool or the ssd cache 
pool when used independently.  This was true both for 4K random writes 
and 4K random zipf 1.2 distribution writes.  This was primarily limited 
by the sequential write throughput of the cache tier due to the number 
of 4MB object promotions.

Mark

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html