Re: large reads become 512 kbyte reads on qemu-kvm rbd

Andrei Mikhailovsky <andrei@xxxxxxxxxx> · Fri, 28 Nov 2014 14:28:50 +0000 (GMT)

Dan, are you setting this on the guest vm side? Did you run some tests to see if this impacts performance? Like small block size performance, etc?

Cheers

From: "Dan Van Der Ster" <daniel.vanderster@xxxxxxx>
To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Friday, 28 November, 2014 1:33:20 PM
Subject: Re:  large reads become 512 kbyte reads on qemu-kvm rbd

Hi,
After some more tests we’ve found that max_sectors_kb is the reason for splitting large IOs.
We increased it to 4MB:
   echo 4096 > cat /sys/block/vdb/queue/max_sectors_kb
and now fio/iostat are showing reads up to 4MB are getting through to the block device unsplit.

We use 4MB to match the size of the underlying RBD objects. I can’t think of a reason to split IOs smaller than the RBD objects -- with a small max_sectors_kb the client would use 8 IOs to read a single object.

Does anyone know of a reason that max_sectors_kb should not be set to the RBD object size? Is there any udev rule or similar that could set max_sectors_kb when a RBD device is attached?

Cheers, Dan

On 27 Nov 2014, at 20:29, Dan Van Der Ster <daniel.vanderster@xxxxxxx> wrote:

Oops, I was off by a factor of 1000 in my original subject. We actually have 4M and 8M reads being split into 100 512kB reads per second. So perhaps these are limiting:
# cat /sys/block/vdb/queue/max_sectors_kb

512

# cat /sys/block/vdb/queue/read_ahead_kb

512
Questions below remain.
Cheers, Dan
On 27 Nov 2014 18:26, Dan Van Der Ster <daniel.vanderster@xxxxxxx> wrote:

Hi all,

We throttle (with qemu-kvm) rbd devices to 100 w/s and 100 r/s (and 80MB/s write and read).

With fio we cannot exceed 51.2MB/s sequential or random reads, no matter the reading block size. (But with large writes we can achieve 80MB/s).

I just realised that the VM subsytem is probably splitting large reads into 512 byte reads, following at least one of:

# cat /sys/block/vdb/queue/hw_sector_size

512

# cat /sys/block/vdb/queue/minimum_io_size

512

# cat /sys/block/vdb/queue/optimal_io_size

0

vdb is an RBD device coming over librbd, with rbd cache=true and mounted like this:

  /dev/vdb on /vicepa type xfs (rw)

Did anyone observe this before? 

Is there a kernel setting to stop splitting reads like that? or a way to change the io_sizes reported by RBD to the kernel).

(I found a similar thread on the lvm mailing list, but lvm shouldn’t be involved here.)

All components here are running latest dumpling. Client VM is running CentOS 6.6.

Cheers, Dan

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com