Re: CephFS write performance

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 21 Jul 2016 11:05:40 -0700

On Thu, Jul 21, 2016 at 10:55 AM, Fabiano de O. Lucchese
<flucchese@xxxxxxxxx> wrote:
> Hey, guys.
>
> I'm still feeling unlucky about these experiments. Here's what I did:
>
> 1)      Set the parameters described below in ceph.conf
> 2)      Push the ceph.conf to all nodes using ceph-deploy
> 3)      Restart monitor, mds and osd’s on all nodes
> 4)      Ran the test twice atleast and look at the results which came out
> the second time.
> 5)      I mounted the filesystem using mount –t ceph 10.76.38.57:/
> /mnt/mycephfs –o name=admin,secret=xxxxxxxxx
> 6)      Ran the benchmark
>
> I modified the following parameters and ran each test separately
>
> -          “osd journal size” to 5 GB and 10 GB and 20 GB
> -          “osd client message size cap” to 0,1,1024

Well, that's a pretty disastrous one. 0 is unlimited, but otherwise
you're restricting the OSD to having 1 or 1024 bytes of in-flight
client message at once.

The client_oc* params I mentioned should help even things out without
serializing it so badly, although as a client-side thing it only
applies to ceph-fuse. I'm not sure what to do about kernel mounts.
-Greg

> -          “osd pool default min size” to 1 and 3
>
>
> In all the above I observed similar pattern as below.
> -          about 5-6 Gbps throughput at the start of the test and gradually
> dropping to 900 Mbps till the test completes. I also observed that post
> 150-160 files being written there is a wait for about 10-15 seconds before
> the next file is written.
>
> Test using FUSE.
> I also installed FUSE on cephnode1 and mounted the fuse mount using the
> following command
> ceph-fuse –m 10.76.38.56 /mnt/mycephfs
>
> I saw a drastic reduction in write throughput to around 170 Mbps. The system
> took about 5-6 seconds before it started writing any files and was
> constantly at 150 – 180 Mbps write through put when the directory was
> mounted using FUSE.
>
> Any additional thoughts? Would the problem be due to my NFS client?
>
> Regards,
>
> F.
>
> ________________________________
> From: Gregory Farnum <gfarnum@xxxxxxxxxx>
> To: Patrick Donnelly <pdonnell@xxxxxxxxxx>
> Cc: Fabiano de O. Lucchese <flucchese@xxxxxxxxx>;
> "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
> Sent: Tuesday, July 19, 2016 5:23 PM
> Subject: Re:  CephFS write performance
>
> On Tue, Jul 19, 2016 at 9:39 AM, Patrick Donnelly <pdonnell@xxxxxxxxxx>
> wrote:
>> On Tue, Jul 19, 2016 at 10:25 AM, Fabiano de O. Lucchese
>> <flucchese@xxxxxxxxx> wrote:
>>> I configured the cluster to replicate data twice (3 copies), so these
>>> numbers fall within my expectations. So far so good, but here's comes the
>>> issue: I configured CephFS and mounted a share locally on one of my
>>> servers.
>>> When I write data to it, it shows abnormally high performance at the
>>> beginning for about 5 seconds, stalls for about 20 seconds and then picks
>>> up
>>> again. For long running tests, the observed write throughput is very
>>> close
>>> to what the rados bench provided (about 640 MB/s), but for short-lived
>>> tests, I get peak performances of over 5GB/s. I know that journaling is
>>> expected to cause spiky performance patters like that, but not to this
>>> level, which makes me think that CephFS is buffering my writes and
>>> returning
>>> the control back to client before persisting them to the jounal, which
>>> looks
>>> undesirable.
>>
>> The client is buffering the writes to RADOS which would give you the
>> abnormally high initial performance until the cache needs flushed. You
>> might try tweaking certain osd settings:
>>
>> http://docs.ceph.com/docs/hammer/rados/configuration/osd-config-ref/
>>
>> in particular: "osd client message size cap". Also:
>
> I am reasonably sure you don't want to change the message size cap;
> that's entirely an OSD-side throttle about how much dirty data it lets
> in before it stops reading off the wire — and I don't think the client
> feeds back from outgoing data. More likely it's about how much dirty
> data is being absorbed by the Client before it forces writes out to
> the OSDs and you want to look at
>
> client_oc_size (default 1024*1024*200, aka 200MB)
> client_oc_max_dirty (default 100MB)
> client_oc_target_dirty (default 8MB)
>
> and turn down the max dirty limits if you're finding it's too bumpy a ride.
> -Greg
>
>
>>
>> http://docs.ceph.com/docs/hammer/rados/configuration/journal-ref/
>>
>> --
>> Patrick Donnelly
>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com