Re: RBD caching on 4K reads???

Nicheal <zay11022@xxxxxxxxx> · Tue, 3 Feb 2015 11:34:38 +0800

It seems you use the kernel rbd. So rbd_cache does not work, which is
just designed for librbd. Kernel rbd is directly using the system page
cache. You said that you have already run like echo 3 >
/proc/sys/vm/drop_cache to invalidate all pages cached in kernel. So
do you test the /dev/rbd1 based on any filesystem, such ext4 or xfs?
If so, and you run the test tool like fio, first with a write test and
file_size = 10G. Then a file(10G) is created by fio but with lots of
holes in the file, and your read test may read those holes so that
filesystem can tell thay contain nothing and there is no need to
access the physical disk to get data. You may check the fiemap of the
file to see whether it contains holes or you just remove the file and
recreate the file by a read test.

Ning Yao

2015-01-31 4:51 GMT+08:00 Bruce McFarland <Bruce.McFarland@xxxxxxxxxxxxxxxx>:
> I have a cluster and have created a rbd device - /dev/rbd1. It shows up as
> expected with ‘rbd –image test info’ and rbd showmapped. I have been looking
> at cluster performance with the usual Linux block device tools – fio and
> vdbench. When I look at writes and large block sequential reads I’m seeing
> what I’d expect with performance limited by either my cluster interconnect
> bandwidth or the backend device throughput speeds – 1 GE frontend and
> cluster network and 7200rpm SATA OSDs with 1 SSD/osd for journal. Everything
> looks good EXCEPT 4K random reads. There is caching occurring somewhere in
> my system that I haven’t been able to detect and suppress - yet.
>
>
>
> I’ve set ‘rbd_cache=false’ in the [client] section of ceph.conf on the
> client, monitor, and storage nodes. I’ve flushed the system caches on the
> client and storage nodes before test run ie vm.drop_caches=3 and set the
> huge pages to the maximum available to consume free system memory so that it
> can’t be used for system cache . I’ve also disabled read-ahead on all of the
> HDD/OSDs.
>
>
>
> When I run a 4k randon read workload on the client the most I could expect
> would be ~100iops/osd x number of osd’s – I’m seeing an order of magnitude
> greater than that AND running IOSTAT on the storage nodes show no read
> activity on the OSD disks.
>
>
>
> Any ideas on what I’ve overlooked? There appears to be some read-ahead
> caching that I’ve missed.
>
>
>
> Thanks,
>
> Bruce
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com