On Fri, May 11, 2018 at 3:59 AM, Marc Schöchlin <ms@xxxxxxxxxx> wrote: > Hello Jason, > > thanks for your response. > > > Am 10.05.2018 um 21:18 schrieb Jason Dillaman: > > If i configure caches like described at > http://docs.ceph.com/docs/luminous/rbd/rbd-config-ref/, are there dedicated > caches per rbd-nbd/krbd device or is there a only a single cache area. > > The librbd cache is per device, but if you aren't performing direct > IOs to the device, you would also have the unified Linux pagecache on > top of all the devices. > > XENServer directly utilizes nbd devices which are connected in my > understanding by blkback (dom-0) and blkfront (dom-U) to the virtual > machines. > In my understanding pagecache is only part of the game if i use data on > mounted filesystems (VFS usage). > Therefore it would be a good thing to use rbd cache for rbd-nbd (/dev/nbdX). I cannot speak for Xen, but in general IO to a block device will hit the pagecache unless the IO operation is flagged as direct (e.g. O_DIRECT) to bypass the pagecache and directly send it to the block device. > How can i identify the rbd cache with the tools provided by the operating > system? > > Identify how? You can enable the admin sockets and use "ceph > --admin-deamon config show" to display the in-use settings. > > > Ah ok, i discovered that i can gather configuration settings by executing: > (xen_test is the identity of the xen rbd_nbd user) > > ceph --id xen_test --admin-daemon /var/run/ceph/ceph-client.xen_test.asok > config show | less -p rbd_cache > > Sorry, my question was a bit unprecice: I was searching for usage statistics > of the rbd cache. > Is there also a possibility to gather rbd_cache usage statistics as a source > of verification for optimizing the cache settings? You can run "perf dump" instead of "config show" to dump out the current performance counters. There are some stats from the in-memory cache included in there. > Due to the fact that a rbd cache is created for every device, i assume that > the rbd cache simply part of the rbd-nbd process memory. Correct. > > Can you provide some hints how to about adequate cache settings for a write > intensive environment (70% write, 30% read)? > Is it a good idea to specify a huge rbd cache of 1 GB with a max dirty age > of 10 seconds? Depends on your workload and your testing results. I suspect a database on top of RBD is going to do its own read caching and will be issuing lots of flush calls to the block device, potentially negating the need for a large cache. > The librbd cache is really only useful for sequential read-ahead and > for small writes (assuming writeback is enabled). Assuming you aren't > using direct IO, I'd suspect your best performance would be to disable > the librbd cache and rely on the Linux pagecache to work its magic. > > As described, xenserver directly utilizes the nbd devices. > > Our typical workload is originated over 70 percent in database write > operations in the virtual machines. > Therefore collecting write operations with rbd cache and writing them in > chunks to ceph might be a good thing. > A higher limit for "rbd cache max dirty" might be a adequate here. > At the other side our read workload typically reads huge files in sequential > manner. > > Therefore it might be useful to do start with a configuration like that: > > rbd cache size = 64MB > rbd cache max dirty = 48MB > rbd cache target dirty = 32MB > rbd cache max dirty age = 10 > > What is the strategy of librbd to write data to the storage from rbd_cache > if "rbd cache max dirty = 48MB" is reached? > Is there a reduction of io operations (merging of ios) compared to the > granularity of writes of my virtual machines? If the cache is full, incoming IO will be stalled as the dirty bits are written back to the backing RBD image to make room available for the new IO request. > Additionally, i would do no non-default settings for readahead on nbd level > to have the possibility to configure this at operating system level of the > vms. > > Our operating systems in the virtual machines use currently a readahead of > 256 (256*512 = 128KB). > From my point of view it would be a good thing for sequential reads in big > files to increase readahead to a higher value. > We haven't changed the default rbd object size of 4MB - nevertheless it > might be a good thing to increase the readahead to 1024 (=512KB) to decrease > read requests by factor of 4 for sequential reads. > > What do you think about this? Depends on your workload. > Regards > Marc > -- Jason -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html