I'm having an issue with small sequential reads (such as searching through source code files, etc), and I found that multiple small reads withing a 4MB boundary would fetch the same object from the OSD multiple times, as it gets inserted into the RBD cache partially. How to reproduce: rbd image accessed from a Qemu vm using virtio-scsi, writethrough cache on. Monitor with perf dump on the rbd client. The image is filled up with zeroes in advance. Rbd readahead is off. 1 - Small read from a previously unread section of the disk: dd if=/dev/sdb ibs=512 count=1 skip=41943040 iflag=skip_bytes Notes: dd cannot read less than 512 bytes. The skip is arbitrary to avoid the beginning of the disk, which would have been read at boot. Expected outcomes: perf dump should show a +1 increase on values rd, cache_ops_miss and op_r. This happens correctly. It should show a 4194304 increase in data_read as a whole object is put into the cache. Instead it increases by 4096. (not sure why 4096, btw). 2 - Small read from less than 4MB distance (in the example, +5000b). dd if=/dev/sdb ibs=512 count=1 skip=41948040 iflag=skip_bytes Expected outcomes: perf dump should show a +1 increase on cache_ops_hit. Instead cache_ops_miss increases. It should show a 4194304 increase in data_read as a whole object is put into the cache. Instead it increases by 4096. op_r should not increase. Instead it increases by one, indicating that the object was fetched again. My tests show that this could be causing a 6 to 20-fold performance loss in small sequential reads. Is it by design that the RBD cache only inserts the portion requested by the client instead of the whole last object fetched? Could it be a tunable in any of my layers (fs, block device, qemu, rbd...) that is preventing this? Regards, -- Ruben Rodriguez | Senior Systems Administrator, Free Software Foundation GPG Key: 05EF 1D2F FE61 747D 1FC8 27C3 7FAC 7D26 472F 4409 https://fsf.org | https://gnu.org
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com