I've setup Ceph with a single osd and mon spread over two SSD (Intel 520) - 2G journal on one and the osd data on the other (xfs filesystem). The Intel's are pretty fast, and (despite being shackled by a crappy Nvidia SATA controller) fly for random IO.
However I am not seeing that reflected in the RBD case. I have the device mounted on the local machine where the osd and mon are running (so network performance should not be a factor here).
Here is what I did: Create a rbd device of 10G and mount on /mnt/vol0: $ rbd create --size 10240 vol0 $ rbd map vol0 $ mkfx.xfs /dev/rbd0 $ rbd mount /dev/rdb0 /mnt/vol0 Make a file: $ dd if=/dev/zero of=/mnt/vol0/dump/file bs=4k count=300000 conv=fsync 1228800000 bytes (1.2 GB) copied, 13.4361 s, 91.5 MB/s Performance ok if file size < journal (2G). $ dd if=/dev/zero of=/mnt/vol0/dump/file bs=4096k count=200 conv=fsync 838860800 bytes (839 MB) copied, 9.47086 s, 88.6 MB/s Not so good if file size > journal. $ dd if=/dev/zero of=/mnt/vol0/dump/file bs=4096k count=1000 conv=fsync 4194304000 bytes (4.2 GB) copied, 279.891 s, 15.0 MB/sRandom writes (see attached file) sync'ed with sync_file_range are ok if block size big:
$ ./writetest /mnt/vol0/dump/file 4194304 0 1random writes: 292 of: 4194304 bytes elapsed: 9.8397s io rate: 30/s (118.70 MB/s)
$ ./writetest /mnt/vol0/dump/file 1048576 0 1random writes: 1171 of: 1048576 bytes elapsed: 10.6042s io rate: 110/s (110.43 MB/s)
$ ./writetest /mnt/vol0/dump/file 131072 0 1random writes: 9375 of: 131072 bytes elapsed: 15.8075s io rate: 593/s (74.13 MB/s)
However smallish block size is suicide (trigger suicide assert after a while), I see 100 IOPS or less on actual devices, all 100% util:
$ ./writetest /mnt/vol0/dump/file 8192 0 1 I am running into http://tracker.newdream.net/issues/2784 here I think. Note that the actual SSD are very fast for this when accessed directly: $ ./writetest /data1/ceph/1/file 8192 0 1random writes: 1000000 of: 8192 bytes elapsed: 125.7907s io rate: 7950/s (62.11 MB/s)
Thanks for your patience in reading so far - some actual questions now :-)1/ Why is the appending write from dd when the size of file > journal so slow, despite reasonably capable storage devices?
2/ Is the sudden dramatic drop in random write performance a manifestation of the "small requests are slow" issue? or is this something else?
Thanks Mark
Attachment:
ceph.conf.gz
Description: GNU Zip compressed data
Attachment:
writetest.c.gz
Description: GNU Zip compressed data