Re: CephFS: slow writes over NFS when fs is mounted with kernel driver but fast with Fuse

"Yan, Zheng" <ukernel@xxxxxxxxx> · Mon, 6 Jun 2016 11:08:04 +0800

On Fri, Jun 3, 2016 at 10:43 PM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
> I'd be worried about it getting "fast" all of sudden. Test crash
> consistency.
> If you test something like file creation you should be able to estimate if
> it should be that fast. (So it should be some fraction of theoretical IOPS
> on the drives/backing rbd device...)

Sudden "fast" is because MDS flushes its journal more frequently.
There is no risk of metadata/data loss.

Yan, Zheng

>
> If it's too fast then maybe the "sync" isn't working properly...
>
> Jan
>
> On 03 Jun 2016, at 16:26, David <dclistslinux@xxxxxxxxx> wrote:
>
> Zheng, thanks for looking into this, it makes sense although strangely I've
> set up a new nfs server (different hardware, same OS, Kernel etc.) and I'm
> unable to recreate the issue. I'm no longer getting the delay, the nfs
> export is still using sync. I'm now comparing the servers to see what's
> different on the original server. Apologies if I've wasted your time on
> this!
>
> Jan, I did some more testing with Fuse on the original server and I was
> seeing the same issue, yes I was testing from the nfs client. As above I
> think there was something weird with that original server. Noted on sync vs
> async, I plan on sticking with sync.
>
> On Fri, Jun 3, 2016 at 5:03 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>
>> On Mon, May 30, 2016 at 10:29 PM, David <dclistslinux@xxxxxxxxx> wrote:
>> > Hi All
>> >
>> > I'm having an issue with slow writes over NFS (v3) when cephfs is
>> > mounted
>> > with the kernel driver. Writing a single 4K file from the NFS client is
>> > taking 3 - 4 seconds, however a 4K write (with sync) into the same
>> > folder on
>> > the server is fast as you would expect. When mounted with ceph-fuse, I
>> > don't
>> > get this issue on the NFS client.
>> >
>> > Test environment is a small cluster with a single MON and single MDS,
>> > all
>> > running 10.2.1, CephFS metadata is an ssd pool, data is on spinners. The
>> > NFS
>> > server is CentOS 7, I've tested with the current shipped kernel (3.10),
>> > ELrepo 4.4 and ELrepo 4.6.
>> >
>> > More info:
>> >
>> > With the kernel driver, I mount the filesystem with "-o
>> > name=admin,secret"
>> >
>> > I've exported a folder with the following options:
>> >
>> > *(rw,root_squash,sync,wdelay,no_subtree_check,fsid=1244,sec=1)
>> >
>> > I then mount the folder on a CentOS 6 client with the following options
>> > (all
>> > default):
>> >
>> >
>> > rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.3.231,mountvers=3,mountport=597,mountproto=udp,local_lock=none
>> >
>> > A small 4k write is taking 3 - 4 secs:
>> >
>> >  # time dd if=/dev/zero of=testfile bs=4k count=1
>> > 1+0 records in
>> > 1+0 records out
>> > 4096 bytes (4.1 kB) copied, 3.59678 s, 1.1 kB/s
>> >
>> > real    0m3.624s
>> > user    0m0.000s
>> > sys     0m0.001s
>> >
>> > But a sync write on the sever directly into the same folder is fast
>> > (this is
>> > with the kernel driver):
>> >
>> > # time dd if=/dev/zero of=testfile2 bs=4k count=1 conv=fdatasync
>> > 1+0 records in
>> > 1+0 records out
>> > 4096 bytes (4.1 kB) copied, 0.0121925 s, 336 kB/s
>>
>>
>> Your nfs export has sync option. 'dd if=/dev/zero of=testfile bs=4k
>> count=1' on nfs client is equivalent to 'dd if=/dev/zero of=testfile
>> bs=4k count=1 conv=fsync' on cephfs. The reason that sync metadata
>> operation takes 3~4 seconds is that the MDS flushes its journal every
>> 5 seconds.  Adding async option to nfs export can avoid this delay.
>>
>> >
>> > real    0m0.015s
>> > user    0m0.000s
>> > sys     0m0.002s
>> >
>> > If I mount cephfs with Fuse instead of the kernel, the NFS client write
>> > is
>> > fast:
>> >
>> > dd if=/dev/zero of=fuse01 bs=4k count=1
>> > 1+0 records in
>> > 1+0 records out
>> > 4096 bytes (4.1 kB) copied, 0.026078 s, 157 kB/s
>> >
>>
>> In this case, ceph-fuse sends an extra request (getattr request on
>> directory) to MDS. The request causes MDS to flush its journal.
>> Whether or not client sends the extra request depends on what
>> capabilities it has.  What capabilities client has, in turn, depend on
>> how many clients are accessing the directory. In my test, nfs on
>> ceph-fuse is not always fast.
>>
>> Yan, Zheng
>>
>>
>> > Does anyone know what's going on here?
>>
>>
>>
>> >
>> > Thanks
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com