Hi Andrei,
Yes, I’m testing from within the guest.
Here is an example. First, I do 2MB reads when the max_sectors_kb=512, and we see the reads are split into 4. (fio sees 25 iops, though iostat reports 100 smaller iops):
# echo 512 > /sys/block/vdb/queue/max_sectors_kb # this is the default
# fio --readonly --name /dev/vdb --rw=read --size=1G --ioengine=libaio --direct=1 --runtime=10s --blocksize=2m
/dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1
fio-2.0.13
Starting 1 process
Jobs: 1 (f=1): [R] [100.0% done] [51200K/0K/0K /s] [25 /0 /0 iops] [eta 00m:00s]
meanwhile iostat is reporting 100 iops of average size 1024 sectors (i.e. 512kB):
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
vdb 0.00 0.00 100.00 0.00 50.00 0.00 1024.00 3.02 30.25 10.00 100.00
Now increase the max_sectors_kb to 4MB, and the IOs are no longer split:
# echo 4096 > /sys/block/vdb/queue/max_sectors_kb
# fio --readonly --name /dev/vdb --rw=read --size=1G --ioengine=libaio --direct=1 --runtime=10s --blocksize=2m
/dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1
fio-2.0.13
Starting 1 process
Jobs: 1 (f=1): [R] [100.0% done] [200.0M/0K/0K /s] [100 /0 /0 iops] [eta 00m:00s]
iostat reports 100 iops, 4096 sectors each read (i.e. 2MB):
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
vdb 300.00 0.00 100.00 0.00 200.00 0.00 4096.00 0.99 9.94 9.94 99.40
Cheers, Dan
Dan, are you setting this on the guest vm side? Did you run some tests to see if this impacts performance? Like small block size performance, etc?
Cheers
From: "Dan Van Der Ster" <daniel.vanderster@xxxxxxx>
To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Friday, 28 November, 2014 1:33:20 PM
Subject: Re: [ceph-users] large reads become 512 kbyte reads on qemu-kvm rbd
Hi,
After some more tests we’ve found that max_sectors_kb is the reason for splitting large IOs.
We increased it to 4MB:
echo 4096 > cat /sys/block/vdb/queue/max_sectors_kb
and now fio/iostat are showing reads up to 4MB are getting through to the block device unsplit.
We use 4MB to match the size of the underlying RBD objects. I can’t think of a reason to split IOs smaller than the RBD objects -- with a small max_sectors_kb the client would use 8 IOs to read a single object.
Does anyone know of a reason that max_sectors_kb should not be set to the RBD object size? Is there any udev rule or similar that could set max_sectors_kb when a RBD device is attached?
Cheers, Dan
Oops, I was off by a factor of 1000 in my original subject. We actually have 4M and 8M reads being split into 100 512kB reads per second. So perhaps these are limiting:
# cat /sys/block/vdb/queue/max_sectors_kb
512
# cat /sys/block/vdb/queue/read_ahead_kb
512
Questions below remain.
Cheers, Dan
On 27 Nov 2014 18:26, Dan Van Der Ster < daniel.vanderster@xxxxxxx> wrote:
Hi all,
We throttle (with qemu-kvm) rbd devices to 100 w/s and 100 r/s (and 80MB/s write and read).
With fio we cannot exceed 51.2MB/s sequential or random reads, no matter the reading block size. (But with large writes we can achieve 80MB/s).
I just realised that the VM subsytem is probably splitting large reads into 512 byte reads, following at least one of:
# cat /sys/block/vdb/queue/hw_sector_size
512
# cat /sys/block/vdb/queue/minimum_io_size
512
# cat /sys/block/vdb/queue/optimal_io_size
0
vdb is an RBD device coming over librbd, with rbd cache=true and mounted like this:
/dev/vdb on /vicepa type xfs (rw)
Did anyone observe this before?
Is there a kernel setting to stop splitting reads like that? or a way to change the io_sizes reported by RBD to the kernel).
(I found a similar thread on the lvm mailing list, but lvm shouldn’t be involved here.)
All components here are running latest dumpling. Client VM is running CentOS 6.6.
Cheers, Dan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|