Re: Bluestore caching oddities, again

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/4/19 6:09 AM, Paul Emmerich wrote:

On Sun, Aug 4, 2019 at 3:47 AM Christian Balzer <chibi@xxxxxxx> wrote:

2. Bluestore caching still broken
When writing data with the fios below, it isn't cached on the OSDs.
Worse, existing cached data that gets overwritten is removed from the
cache, which while of course correct can't be free in terms of allocation
overhead.
Why not doing what any sensible person would expect from experience with
any other cache there is, cache writes in case the data gets read again
soon and in case of overwrites use existing allocations.
This is by design.
The BlueStore only populates its cache on reads, not on writes. The idea is
that a reasonable application does not read data it just wrote (and if it does
it's already cached at a higher layer like the page cache or a cache on the
hypervisor).


Note that this behavior can be change by setting bluestore_default_buffered_write = true.


FWIW, there's also a CPU usage and lock contention penalty for default buffered write when using extremely fast flash storage.  A lot of my recent work on improving cache performance and intelligence in bluestore is to reduce contention in the onode/buffer cache and also significantly reduce the impact of default buffered write = true.  The PriorityCacheManger was a big one to do a better job of autotuning. Another big one that recently merged was refactoring bluestore's caches to trim on write (better memory behavior, shorter more frequent trims, trims distributed across threads) and not share a single lock between the onode and buffer cache:


https://github.com/ceph/ceph/pull/28597


Ones still coming down the pipe are to avoid double caching onodes in the bluestore onode cache and rocksdb block cache and age-binning the LRU caches to better redistribute memory between caches based on relative age.  This is the piece that hopefully would let you cache on write while still having the priority of those cached writes quickly fall off if they are never read back (the more cache you have, the more effective this would be at keeping the onode/omap ratios relatively higher).


Mark





Paul
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux