Re: Bad IO performance CephFS vs. NFS for block size 4k/128k

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, Sep 4, 2017 at 4:27 PM, <c.monty@xxxxxx> wrote:
Hello!

I'm validating IO performance of CephFS vs. NFS.

Therefore I have mounted the relevant filesystems on the same client.
Then I start fio with the following parameters:
action = "" randrw
blocksize = 4k 128k 8m
rwmixreadread = 70 50 30
32 jobs run in parallel

The NFS share is striping over 5 virtual disks with a 4+1 RAID5 configuration; each disk has ~8TB.
The CephFS is configured on 2 MDS servers (1 up:active, 1 up:standby); each MDS has 47 OSDs where 1 OSD is represented by single 8TB disk.
(The disks of RAID5 and OSD are identical.)

So that's a 2 node cluster? I'm assuming filestore OSDs with journals on the OSDs, 2x or 3x replication. The NFS server on local storage is going to perform much better as you've found. 

What I can see is that the IO performance of blocksize 8m is slightly better with CephFS, but worse (by factor 4-10) with blocksize 4k / 128k.

Not surprising. You can 
Here the stats for randrw with mix 30:
ld9930:/home # tail -n 3 ld9930-fio-test-cephfs-randrw30-8m
Run status group 0 (all jobs):
   READ: bw=335MiB/s (351MB/s), 335MiB/s-335MiB/s (351MB/s-351MB/s), io=19.7GiB (21.2GB), run=60099-60099msec
  WRITE: bw=753MiB/s (789MB/s), 753MiB/s-753MiB/s (789MB/s-789MB/s), io=44.2GiB (47.5GB), run=60099-60099msec

ld9930:/home # tail -n 3 ld9930-fio-test-nfs-randrw30-8m
Run status group 0 (all jobs):
   READ: bw=324MiB/s (340MB/s), 324MiB/s-324MiB/s (340MB/s-340MB/s), io=19.0GiB (20.5GB), run=60052-60052msec
  WRITE: bw=725MiB/s (760MB/s), 725MiB/s-725MiB/s (760MB/s-760MB/s), io=42.6GiB (45.7GB), run=60052-60052msec

ld9930:/home # tail -n 3 ld9930-fio-test-nfs-randrw30-128k
Run status group 0 (all jobs):
   READ: bw=287MiB/s (301MB/s), 287MiB/s-287MiB/s (301MB/s-301MB/s), io=16.9GiB (18.7GB), run=60006-60006msec
  WRITE: bw=667MiB/s (700MB/s), 667MiB/s-667MiB/s (700MB/s-700MB/s), io=39.1GiB (41.1GB), run=60006-60006msec

ld9930:/home # tail -n 3 ld9930-fio-test-cephfs-randrw30-128k
Run status group 0 (all jobs):
   READ: bw=69.2MiB/s (72.6MB/s), 69.2MiB/s-69.2MiB/s (72.6MB/s-72.6MB/s), io=4172MiB (4375MB), run=60310-60310msec
  WRITE: bw=161MiB/s (169MB/s), 161MiB/s-161MiB/s (169MB/s-169MB/s), io=9732MiB (10.3GB), run=60310-60310msec

ld9930:/home # tail -n 3 ld9930-fio-test-cephfs-randrw30-4k
Run status group 0 (all jobs):
   READ: bw=5631KiB/s (5766kB/s), 5631KiB/s-5631KiB/s (5766kB/s-5766kB/s), io=330MiB (346MB), run=60043-60043msec
  WRITE: bw=12.8MiB/s (13.4MB/s), 12.8MiB/s-12.8MiB/s (13.4MB/s-13.4MB/s), io=767MiB (804MB), run=60043-60043msec

ld9930:/home # tail -n 3 ld9930-fio-test-nfs-randrw30-4k
Run status group 0 (all jobs):
   READ: bw=77.2MiB/s (80.8MB/s), 77.2MiB/s-77.2MiB/s (80.8MB/s-80.8MB/s), io=4621MiB (4846MB), run=60004-60004msec
  WRITE: bw=180MiB/s (188MB/s), 180MiB/s-180MiB/s (188MB/s-188MB/s), io=10.6GiB (11.4GB), run=60004-60004msec


This implies that for good IO performance only data with blocksize > 128k (I guess > 1M) should be used.
Can anybody confirm this?

THX
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux