Re: RBD cache being filled up in small increases instead of 4MB

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 14 Jul 2017 16:09:07 -0700



On Fri, Jul 14, 2017 at 3:43 PM, Ruben Rodriguez <ruben@xxxxxxx> wrote:
>
> I'm having an issue with small sequential reads (such as searching
> through source code files, etc), and I found that multiple small reads
> withing a 4MB boundary would fetch the same object from the OSD multiple
> times, as it gets inserted into the RBD cache partially.
>
> How to reproduce: rbd image accessed from a Qemu vm using virtio-scsi,
> writethrough cache on. Monitor with perf dump on the rbd client. The
> image is filled up with zeroes in advance. Rbd readahead is off.
>
> 1 - Small read from a previously unread section of the disk:
> dd if=/dev/sdb ibs=512 count=1 skip=41943040 iflag=skip_bytes
> Notes: dd cannot read less than 512 bytes. The skip is arbitrary to
> avoid the beginning of the disk, which would have been read at boot.
>
> Expected outcomes: perf dump should show a +1 increase on values rd,
> cache_ops_miss and op_r. This happens correctly.
> It should show a 4194304 increase in data_read as a whole object is put
> into the cache. Instead it increases by 4096. (not sure why 4096, btw).
>
> 2 - Small read from less than 4MB distance (in the example, +5000b).
> dd if=/dev/sdb ibs=512 count=1 skip=41948040 iflag=skip_bytes
> Expected outcomes: perf dump should show a +1 increase on cache_ops_hit.
> Instead cache_ops_miss increases.
> It should show a 4194304 increase in data_read as a whole object is put
> into the cache. Instead it increases by 4096.
> op_r should not increase. Instead it increases by one, indicating that
> the object was fetched again.
>
> My tests show that this could be causing a 6 to 20-fold performance loss
> in small sequential reads.
>
> Is it by design that the RBD cache only inserts the portion requested by
> the client instead of the whole last object fetched? Could it be a
> tunable in any of my layers (fs, block device, qemu, rbd...) that is
> preventing this?

I don't know the exact readahead default values in that stack, but
there's no general reason to think RBD (or any Ceph component) will
read a whole object at a time. In this case, you're asking for 512
bytes and it appears to have turned that into a 4KB read (probably the
virtual block size in use?), which seems pretty reasonable — if you
were asking for 512 bytes out of every 4MB and it was reading 4MB each
time, you'd probably be wondering why you were only getting 1/8192 the
expected bandwidth. ;)
-Greg

>
> Regards,
> --
> Ruben Rodriguez | Senior Systems Administrator, Free Software Foundation
> GPG Key: 05EF 1D2F FE61 747D 1FC8  27C3 7FAC 7D26 472F 4409
> https://fsf.org | https://gnu.org
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com