Re: Poor performance with three nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/2/2013 3:50 PM, Sage Weil wrote:
On Wed, 2 Oct 2013, Eric Lee Green wrote:
By contrast, that same dd to an iSCSI volume exported by one of the servers
wrote at 240 megabytes per second. Order of magnitude difference.
Can you see what 'rados -p rbd bench 60 write' tells you?

Pretty much the same as what I got with the dd smoketest:

 Total time run:         62.526671
Total writes made:      770
Write size:             4194304
Bandwidth (MB/sec):     49.259

Stddev Bandwidth:       36.0099
Max bandwidth (MB/sec): 120
Min bandwidth (MB/sec): 0
Average Latency:        1.29088
Stddev Latency:         1.75083
Max latency:            11.2005
Min latency:            0.102783
[root@stack1 ~]#


I suspect the problem here is an unfortunate combination of what dd does
(1 outstanding write at a time) and what iSCSI is probably doing
(acknowledging the write before it is written to the disk--I'm guess a
write to /dev/* doesn't also send a scsi flush).  This lets you approach
the disk or network bandwidth even though the client/app (dd) is only
dispatching a single 512K IO at a time.

My experience is that while what the dd is doing is not reflective of what a filesystem does, with a block size that large it doesn't matter -- 512K bytes outstanding is sufficient that the latency of issuing writes is no longer an issue. What I get from the dd is within a few percent of what I get copying a large file onto the filesystem or doing other similar tasks involving streaming data onto (or off of) the drive.


I'm curious if the iSCSI number changes if you add oflag=direct or
oflag=sync.

It's also worth pointing out that what dd is doing (single outstanding IO)
no sane file system would do, except perhaps during commit/sync time when
it is carefully ordering IOs.  You might want to try the dd to a file
inside a mounted fs insted of to the raw device.

Well, I created a hidden directory in one of the ceph data stores and copied a file in it to see what dd would do in that case, note that this is with 512-byte blocks from one xfs filesystem to another xfs filesystem:

[root@storage1 .t]# dd if=/export/home1/linux.tgz of=linux.tgz
9177080+0 records in
9177080+0 records out
4698664960 bytes (4.7 GB) copied, 15.5054 s, 303 MB/s
[root@storage1 .t]#

Seems similar to what I expect, reading from one SAS channel and writing to the other SAS channel (3Gigabit/sec SAS channels). I've benchmarked this combination before for streaming writes and I can maintain that bandwidth pretty much forever.

My conclusion at the moment is that a) ceph isn't a good match for my infrastructure, it really wants its own dedicated hardware with no RAID, and b) even there I should not expect much more for single-stream writes than what I'm seeing above, though aggregate write performance should scale. Unfortunately (b) isn't my workload, where aggregate bandwidth requirements are modest but burst bandwidth requirements are high.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux