Re: CephFS: slow writes over NFS when fs is mounted with kernel driver but fast with Fuse

Jan Schermer <jan@xxxxxxxxxxx> · Fri, 3 Jun 2016 11:12:02 +0200

It should be noted that using  "async" with NFS _will_ corrupt your data if anything happens.
It's ok-ish for something like an image library, but it's most certainly not OK for VM drives, databases, or if you write any kind of binary blobs that you can't recreate.

If ceph-fuse is fast (you are testing that on the NFS client side, right?) then it must completely ignore the sync flag the nfs server asks for when doing IO. I'd call that a serious bug unless it's stated somewhere...

Jan

> On 03 Jun 2016, at 06:03, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> 
> On Mon, May 30, 2016 at 10:29 PM, David <dclistslinux@xxxxxxxxx> wrote:
>> Hi All
>> 
>> I'm having an issue with slow writes over NFS (v3) when cephfs is mounted
>> with the kernel driver. Writing a single 4K file from the NFS client is
>> taking 3 - 4 seconds, however a 4K write (with sync) into the same folder on
>> the server is fast as you would expect. When mounted with ceph-fuse, I don't
>> get this issue on the NFS client.
>> 
>> Test environment is a small cluster with a single MON and single MDS, all
>> running 10.2.1, CephFS metadata is an ssd pool, data is on spinners. The NFS
>> server is CentOS 7, I've tested with the current shipped kernel (3.10),
>> ELrepo 4.4 and ELrepo 4.6.
>> 
>> More info:
>> 
>> With the kernel driver, I mount the filesystem with "-o name=admin,secret"
>> 
>> I've exported a folder with the following options:
>> 
>> *(rw,root_squash,sync,wdelay,no_subtree_check,fsid=1244,sec=1)
>> 
>> I then mount the folder on a CentOS 6 client with the following options (all
>> default):
>> 
>> rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.3.231,mountvers=3,mountport=597,mountproto=udp,local_lock=none
>> 
>> A small 4k write is taking 3 - 4 secs:
>> 
>> # time dd if=/dev/zero of=testfile bs=4k count=1
>> 1+0 records in
>> 1+0 records out
>> 4096 bytes (4.1 kB) copied, 3.59678 s, 1.1 kB/s
>> 
>> real    0m3.624s
>> user    0m0.000s
>> sys     0m0.001s
>> 
>> But a sync write on the sever directly into the same folder is fast (this is
>> with the kernel driver):
>> 
>> # time dd if=/dev/zero of=testfile2 bs=4k count=1 conv=fdatasync
>> 1+0 records in
>> 1+0 records out
>> 4096 bytes (4.1 kB) copied, 0.0121925 s, 336 kB/s
> 
> 
> Your nfs export has sync option. 'dd if=/dev/zero of=testfile bs=4k
> count=1' on nfs client is equivalent to 'dd if=/dev/zero of=testfile
> bs=4k count=1 conv=fsync' on cephfs. The reason that sync metadata
> operation takes 3~4 seconds is that the MDS flushes its journal every
> 5 seconds.  Adding async option to nfs export can avoid this delay.
> 
>> 
>> real    0m0.015s
>> user    0m0.000s
>> sys     0m0.002s
>> 
>> If I mount cephfs with Fuse instead of the kernel, the NFS client write is
>> fast:
>> 
>> dd if=/dev/zero of=fuse01 bs=4k count=1
>> 1+0 records in
>> 1+0 records out
>> 4096 bytes (4.1 kB) copied, 0.026078 s, 157 kB/s
>> 
> 
> In this case, ceph-fuse sends an extra request (getattr request on
> directory) to MDS. The request causes MDS to flush its journal.
> Whether or not client sends the extra request depends on what
> capabilities it has.  What capabilities client has, in turn, depend on
> how many clients are accessing the directory. In my test, nfs on
> ceph-fuse is not always fast.
> 
> Yan, Zheng
> 
> 
>> Does anyone know what's going on here?
> 
> 
> 
>> 
>> Thanks
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com