Re: RBD caching on 4K reads???

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes I'm using and the kernel rbd in Ubuntu 14.04 which makes calls into libceph 

root@essperf3:/etc/ceph# lsmod | grep rbd
rbd                    63707  1 
libceph               225026  1 rbd
root@essperf3:/etc/ceph#

I'm doing raw device IO with either fio or vdbench (preferred tool) and there is no filesystem on top of /dev/rbd1. Yes I did invalidate the kmem pages by writing to the drop_caches and I've also allocated huge pages to be the max allowable based on free memory. The huge page allocation should minimize any system caches. I have a, relatively, small storage pool since this is a development environment and there is only ~ 4TB total and the rbd image is 3TB. On my lab system with 320TB I don't see this problem since the data set is orders of magnitude larger than available system cache. 

Maybe I'll should try and test after removing DIMMs from the client system and physically disabling kernel caching.

-----Original Message-----
From: Nicheal [mailto:zay11022@xxxxxxxxx] 
Sent: Monday, February 02, 2015 7:35 PM
To: Bruce McFarland
Cc: ceph-users@xxxxxxxx; Prashanth Nednoor
Subject: Re:  RBD caching on 4K reads???

It seems you use the kernel rbd. So rbd_cache does not work, which is just designed for librbd. Kernel rbd is directly using the system page cache. You said that you have already run like echo 3 > /proc/sys/vm/drop_cache to invalidate all pages cached in kernel. So do you test the /dev/rbd1 based on any filesystem, such ext4 or xfs?
If so, and you run the test tool like fio, first with a write test and file_size = 10G. Then a file(10G) is created by fio but with lots of holes in the file, and your read test may read those holes so that filesystem can tell thay contain nothing and there is no need to access the physical disk to get data. You may check the fiemap of the file to see whether it contains holes or you just remove the file and recreate the file by a read test.

Ning Yao

2015-01-31 4:51 GMT+08:00 Bruce McFarland <Bruce.McFarland@xxxxxxxxxxxxxxxx>:
> I have a cluster and have created a rbd device - /dev/rbd1. It shows 
> up as expected with ‘rbd –image test info’ and rbd showmapped. I have 
> been looking at cluster performance with the usual Linux block device 
> tools – fio and vdbench. When I look at writes and large block 
> sequential reads I’m seeing what I’d expect with performance limited 
> by either my cluster interconnect bandwidth or the backend device 
> throughput speeds – 1 GE frontend and cluster network and 7200rpm SATA 
> OSDs with 1 SSD/osd for journal. Everything looks good EXCEPT 4K 
> random reads. There is caching occurring somewhere in my system that I haven’t been able to detect and suppress - yet.
>
>
>
> I’ve set ‘rbd_cache=false’ in the [client] section of ceph.conf on the 
> client, monitor, and storage nodes. I’ve flushed the system caches on 
> the client and storage nodes before test run ie vm.drop_caches=3 and 
> set the huge pages to the maximum available to consume free system 
> memory so that it can’t be used for system cache . I’ve also disabled 
> read-ahead on all of the HDD/OSDs.
>
>
>
> When I run a 4k randon read workload on the client the most I could 
> expect would be ~100iops/osd x number of osd’s – I’m seeing an order 
> of magnitude greater than that AND running IOSTAT on the storage nodes 
> show no read activity on the OSD disks.
>
>
>
> Any ideas on what I’ve overlooked? There appears to be some read-ahead 
> caching that I’ve missed.
>
>
>
> Thanks,
>
> Bruce
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux