Re: CephFS: Writes are faster than reads?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16-09-14 18:21, Andreas Gerstmayr wrote:
Hello,

I'm currently performing some benchmark tests with our Ceph storage
cluster and trying to find the bottleneck in our system.

I'm writing a random 30GB file with the following command:
$ time fio --name=job1 --rw=write --blocksize=1MB --size=30GB
--randrepeat=0 --end_fsync=1
[...]
  WRITE: io=30720MB, aggrb=893368KB/s, minb=893368KB/s,
maxb=893368KB/s, mint=35212msec, maxt=35212msec

real    0m35.539s

This makes use of the page cache, but fsync()s at the end (network
traffic from the client stops here, so the OSDs should have the data).

When I read the same file back:
$ time fio --name=job1 --rw=read --blocksize=1MB --size=30G
[...]
    READ: io=30720MB, aggrb=693854KB/s, minb=693854KB/s,
maxb=693854KB/s, mint=45337msec, maxt=45337msec

real    0m45.627s

It takes 10s longer. Why? When writing data to a Ceph storage cluster,
the data is written twice (unbuffered to the journal and buffered to
the backing filesystem [1]). On the other hand, reading should be much
faster because it needs only a single operation, the data should be
already in the page cache of the OSDs (I'm reading the same file I've
written before, and the OSDs have plenty of RAM) and reading from
disks is generally faster than writing. Any idea what is going on in
the background, which makes reads more expensive than writes?
I am not an expert here, but I think it basically boils down to that you read it linearly and write (flush cache) in parallel.

If you could read multiple parts of the same file in parallel you could achieve better speeds


I've run these tests multiple times with fairly consistent results.

Cluster Config:
Ceph jewel, 3 nodes with 256GB RAM and 25 disks each (only HDDs,
journal on same disk)
Pool with size=1 and 2048 PGs, CephFS stripe unit: 1MB, stripe count:
10, object size: 10MB
10 GbE, separate frontend+backend network

[1] https://www.sebastien-han.fr/blog/2014/02/17/ceph-io-patterns-the-bad/


Thanks,
Andreas
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux