On 10/2/2013 3:50 PM, Sage Weil wrote:
On Wed, 2 Oct 2013, Eric Lee Green wrote:
By contrast, that same dd to an iSCSI volume exported by one of the servers
wrote at 240 megabytes per second. Order of magnitude difference.
Can you see what 'rados -p rbd bench 60 write' tells you?
Pretty much the same as what I got with the dd smoketest:
Total time run: 62.526671
Total writes made: 770
Write size: 4194304
Bandwidth (MB/sec): 49.259
Stddev Bandwidth: 36.0099
Max bandwidth (MB/sec): 120
Min bandwidth (MB/sec): 0
Average Latency: 1.29088
Stddev Latency: 1.75083
Max latency: 11.2005
Min latency: 0.102783
[root@stack1 ~]#
I suspect the problem here is an unfortunate combination of what dd does
(1 outstanding write at a time) and what iSCSI is probably doing
(acknowledging the write before it is written to the disk--I'm guess a
write to /dev/* doesn't also send a scsi flush). This lets you approach
the disk or network bandwidth even though the client/app (dd) is only
dispatching a single 512K IO at a time.
My experience is that while what the dd is doing is not reflective of
what a filesystem does, with a block size that large it doesn't matter
-- 512K bytes outstanding is sufficient that the latency of issuing
writes is no longer an issue. What I get from the dd is within a few
percent of what I get copying a large file onto the filesystem or doing
other similar tasks involving streaming data onto (or off of) the drive.
I'm curious if the iSCSI number changes if you add oflag=direct or
It's also worth pointing out that what dd is doing (single outstanding IO)
no sane file system would do, except perhaps during commit/sync time when
it is carefully ordering IOs. You might want to try the dd to a file
inside a mounted fs insted of to the raw device.
Well, I created a hidden directory in one of the ceph data stores and
copied a file in it to see what dd would do in that case, note that this
is with 512-byte blocks from one xfs filesystem to another xfs filesystem:
[root@storage1 .t]# dd if=/export/home1/linux.tgz of=linux.tgz
9177080+0 records in
9177080+0 records out
4698664960 bytes (4.7 GB) copied, 15.5054 s, 303 MB/s
[root@storage1 .t]#
Seems similar to what I expect, reading from one SAS channel and writing
to the other SAS channel (3Gigabit/sec SAS channels). I've benchmarked
this combination before for streaming writes and I can maintain that
bandwidth pretty much forever.
My conclusion at the moment is that a) ceph isn't a good match for my
infrastructure, it really wants its own dedicated hardware with no RAID,
and b) even there I should not expect much more for single-stream writes
than what I'm seeing above, though aggregate write performance should
scale. Unfortunately (b) isn't my workload, where aggregate bandwidth
requirements are modest but burst bandwidth requirements are high.
ceph-users mailing list