Re: Slow rbd reads (fast writes) with luminous + bluestore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Mon, Aug 13, 2018 at 9:32 AM Emmanuel Lacour <elacour@xxxxxxxxxxxxxxx> wrote:
Le 13/08/2018 à 15:21, Jason Dillaman a écrit :
> Is this a clean (new) cluster and RBD image you are using for your
> test or has it been burned in? When possible (i.e. it has enough free
> space), bluestore will essentially turn your random RBD image writes
> into sequential writes. This optimization doesn't work for random
> reads unless your read patterns matches your original random write
> pattern.

Cluster is a new one but already hosts some VM images, not yet used on
production, but already has data and had writes/reads.

>
> Note that with the default "stupid" allocator, this optimization will
> at some point hit a massive performance cliff because the allocator
> will aggressively try to re-use free slots that best match the IO
> size, even if that means it will require massive seeking around the
> disk. Hopefully the "bitmap" allocator will address this issue once it
> becomes the stable default in a future release of Ceph.

Well, but not so worst that I see here:

New cluster
=======


file1: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
fio-2.16
Starting 1 process
file1: Laying out IO file(s) (1 file(s) / 2048MB)
Jobs: 1 (f=1): [r(1)] [100.0% done] [876KB/0KB/0KB /s] [219/0/0 iops]
[eta 00m:00s]
file1: (groupid=0, jobs=1): err= 0: pid=3289045: Mon Aug 13 14:58:22 2018
  read : io=16072KB, bw=822516B/s, iops=200, runt= 20009msec

An old cluster with less disks and older hardware, running ceph hammer
============================================

file1: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
fio-2.16
Starting 1 process
file1: Laying out IO file(s) (1 file(s) / 2048MB)
Jobs: 1 (f=0): [f(1)] [100.0% done] [6350KB/0KB/0KB /s] [1587/0/0 iops]
[eta 00m:00s]
file1: (groupid=0, jobs=1): err= 0: pid=15596: Mon Aug 13 14:59:22 2018
  read : io=112540KB, bw=5626.8KB/s, iops=1406, runt= 20001msec



So around 7 times less iops ::(

When using rados bench, new cluster has better results:

New:

Total time run:       10.080886
Total reads made:     3724
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   1477.65
Average IOPS:         369
Stddev IOPS:          59
Max IOPS:             451
Min IOPS:             279
Average Latency(s):   0.0427141
Max latency(s):       0.320013
Min latency(s):       0.00142682


Old:

Total time run:       10.276202
Total reads made:     724
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   281.816
Average IOPS:         70
Stddev IOPS:          5
Max IOPS:             76
Min IOPS:             59
Average Latency(s):   0.226087
Max latency(s):       0.981571
Min latency(s):       0.00343391


so problem seems located on "rbd" side  ...

That's a pretty big apples-to-oranges comparison (4KiB random IO to 4MiB full-object IO). With your RBD workload, the OSDs will be seeking after each 4KiB read but w/ your RADOS bench workload, it's reading a full 4MiB object before seeking.



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux