On 08/19/2013 06:28 AM, Da Chun Ng
wrote:
Sounds like readahead and or caching is helping out a lot here. Btw, you might want to make sure this is actually coming from the disks with iostat or collectl or something.
One thing to keep in mind is that unless you have SSDs in this system, you will be doing 2 writes for every client write to the spinning disks (since data and journals will both be on the same disk). So let's do the math: 6618.2KB/s * 3 replication * 2 (journal + data writes) * 1024 (KB->bytes) / 16384 (write size in bytes) / 15 drives = ~165 IOPS / drive If there is no write coalescing going on, this isn't terrible. If there is, this is terrible. Have you tried buffered writes with the sync engine at the same IO size?
In this case: 11087 * 1024 (KB->bytes) / 16384 / 15 = ~46 IOPS / drive. Definitely not great! You might want to try fiddling with read ahead both on the CephFS client and on the block devices under the OSDs themselves. One thing I did notice back during bobtail is that increasing the number of osd op threads seemed to help small object read performance. It might be worth looking at too. http://ceph.com/community/ceph-bobtail-jbod-performance-tuning/#4kbradosread Other than that, if you really want to dig into this, you can use tools like iostat, collectl, blktrace, and seekwatcher to try and get a feel for what the IO going to the OSDs looks like. That can help when diagnosing this sort of thing.
6001.1KB/s * 3 replication * 2 (journal + data writes) * 1024 (KB->bytes) / 16384 (write size in bytes) / 15 drives = ~150 IOPS / drive
7200 RPM spinning disks typically top out at something like 150 IOPS (and some are lower). With 15 disks, to hit 4127 IOPS you were probably seeing some write coalescing effects (or if these were random reads, some benefit to read ahead). I don't know what kind of controller you have, but in cases where journals are on the same disks as the data, using writeback cache helps a lot because the controller can coalesce the direct IO journal writes in cache and just do big periodic dumps to the drives. That really reduces seek overhead for the writes. Using SSDs for the journals accomplishes much of the same effect, and lets you get faster large IO writes too, but in many chassis there is a density (and cost) trade-off. Hope this helps! Mark
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com