Re: CephFS write performance

John Spray <jspray@xxxxxxxxxx> · Tue, 19 Jul 2016 18:51:30 +0100

On Tue, Jul 19, 2016 at 3:25 PM, Fabiano de O. Lucchese
<flucchese@xxxxxxxxx> wrote:
> Hi, folks.
>
> I'm conducting a series of experiments and tests with CephFS and have been
> facing a behavior over which I can't seem to have much control.
>
> I configured a 5-node Ceph cluster running on enterprise servers. Each
> server has 10 x 6TB HDDs and 2 x 800GB SSDs. I configured the SSDs as a
> RAID-1 device for journaling and also two of the HDDs for the same purpose
> for the sake of comparison. All other 8 HDDs are configured as OSDs. The
> servers have 196GB of RAM and our private network is backed by a 40GB/s
> Brocade switch (frontend is 10Gb/s).
>
> When benchmarking the HDDs directly, here's the performance I get:
>
> dd if=/dev/zero of=/var/lib/ceph/osd/ceph-0/deleteme bs=10G count=1
> oflag=direct &
>
> 0+1 records in
> 0+1 records out
> 2147479552 bytes (2.1 GB) copied, 11.684 s, 184 MB/s
>
> For read performance:
>
> dd if=/var/lib/ceph/osd/ceph-0/deleteme of=/dev/null bs=10G count=1
> iflag=direct &
>
> 0+1 records in
> 0+1 records out
> 2147479552 bytes (2.1 GB) copied, 8.30168 s, 259 MB/s
>
> Now, when I benchmark the OSDs configured with HDD-based journaling, here's
> what I get:
>
> [root@cephnode1 ceph-cluster]# ceph tell osd.1 bench
>
> {
>     "bytes_written": 1073741824,
>     "blocksize": 4194304,
>     "bytes_per_sec": 426840870.000000
> }
>
> which looks coherent. If I switch to the SDD-based journal, here's the new
> figure:
>
> [root@cephnode1 ~]# ceph tell osd.1 bench
> {
>     "bytes_written": 1073741824,
>     "blocksize": 4194304,
>     "bytes_per_sec": 805229549.000000
> }
>
> which, again, looks as expected to me.
>
> Finally, when I run the rados bench, here's what I get:
>
> rados bench -p cephfs_data 300 write --no-cleanup && rados bench -p
> cephfs_data 300 seq
>
> Total time run:         300.345098
> Total writes made:      48327
> Write size:             4194304
> Bandwidth (MB/sec):     643.620
>
> Stddev Bandwidth:       114.222
> Max bandwidth (MB/sec): 1196
> Min bandwidth (MB/sec): 0
> Average Latency:        0.0994289
> Stddev Latency:         0.112926
> Max latency:            1.85983
> Min latency:            0.0139412
>
> ----------------------------------------
>
> Total time run:        300.121930
> Total reads made:      31990
> Read size:             4194304
> Bandwidth (MB/sec):    426.360
>
> Average Latency:       0.149346
> Max latency:           1.77489
> Min latency:           0.00382452
>
> I configured the cluster to replicate data twice (3 copies), so these
> numbers fall within my expectations. So far so good, but here's comes the
> issue: I configured CephFS and mounted a share locally on one of my servers.
> When I write data to it, it shows abnormally high performance at the
> beginning for about 5 seconds, stalls for about 20 seconds and then picks up
> again. For long running tests, the observed write throughput is very close
> to what the rados bench provided (about 640 MB/s), but for short-lived
> tests, I get peak performances of over 5GB/s. I know that journaling is
> expected to cause spiky performance patters like that, but not to this
> level, which makes me think that CephFS is buffering my writes and returning
> the control back to client before persisting them to the jounal, which looks
> undesirable.

If you want to skip the caching in any filesystem, use the O_DIRECT
flag when opening a file.

You don't say exactly what your benchmark is, but presumably you have
a shortage of fsync calls, so you're not actually waiting for things
to persist?

John

> I searched the web for a couple of days looking for ways to disable this
> apparent write buffering, but couldn't find anything. So here comes my
> question: how can I disable it?
>
> Thanks and regards,
>
> F. Lucchese
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com