On 10/2/2013 3:50 PM, Sage Weil wrote:
On Wed, 2 Oct 2013, Eric Lee Green wrote:
By contrast, that same dd to an iSCSI volume exported by one of the servers
wrote at 240 megabytes per second. Order of magnitude difference.
Can you see what 'rados -p rbd bench 60 write' tells you?
Pretty much the same as what I got with the dd smoketest:
Total time run: 62.526671
Total writes made: 770
Write size: 4194304
Bandwidth (MB/sec): 49.259
Stddev Bandwidth: 36.0099
Max bandwidth (MB/sec): 120
Min bandwidth (MB/sec): 0
Average Latency: 1.29088
Stddev Latency: 1.75083
Max latency: 11.2005
Min latency: 0.102783
[root@stack1 ~]#
I suspect the problem here is an unfortunate combination of what dd does
(1 outstanding write at a time) and what iSCSI is probably doing
(acknowledging the write before it is written to the disk--I'm guess a
write to /dev/* doesn't also send a scsi flush). This lets you approach
the disk or network bandwidth even though the client/app (dd) is only
dispatching a single 512K IO at a time.
My experience is that while what the dd is doing is not reflective of
what a filesystem does, with a block size that large it doesn't matter
-- 512K bytes outstanding is sufficient that the latency of issuing
writes is no longer an issue. What I get from the dd is within a few
percent of what I get copying a large file onto the filesystem or doing
other similar tasks involving streaming data onto (or off of) the drive.
I'm curious if the iSCSI number changes if you add oflag=direct or
oflag=sync.
It's also worth pointing out that what dd is doing (single outstanding IO)
no sane file system would do, except perhaps during commit/sync time when
it is carefully ordering IOs. You might want to try the dd to a file
inside a mounted fs insted of to the raw device.
Well, I created a hidden directory in one of the ceph data stores and
copied a file in it to see what dd would do in that case, note that this
is with 512-byte blocks from one xfs filesystem to another xfs filesystem:
[root@storage1 .t]# dd if=/export/home1/linux.tgz of=linux.tgz
9177080+0 records in
9177080+0 records out
4698664960 bytes (4.7 GB) copied, 15.5054 s, 303 MB/s
[root@storage1 .t]#
Seems similar to what I expect, reading from one SAS channel and writing
to the other SAS channel (3Gigabit/sec SAS channels). I've benchmarked
this combination before for streaming writes and I can maintain that
bandwidth pretty much forever.
My conclusion at the moment is that a) ceph isn't a good match for my
infrastructure, it really wants its own dedicated hardware with no RAID,
and b) even there I should not expect much more for single-stream writes
than what I'm seeing above, though aggregate write performance should
scale. Unfortunately (b) isn't my workload, where aggregate bandwidth
requirements are modest but burst bandwidth requirements are high.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com