== Background
In the event of a nearly full disk or the consumption of most of our
available inodes, cachefiles will instruct the daemon to begin culling
files in the order of oldest-to-newest in order to free up resources.
This can be quite slow for any appreciably large cache.
== First-pass improvement
David Howells produced a time ago part of a patch to improve this scheme
by introducing into cachefiles a series of page-sized bitmaps which
associated an ordinal index or slot number to each fscache object. In
this scheme, each cachefiles file representation of a cache object is
given an xattribute which indicates its slot number in the bitmap.
A "culling index" file is produced which contains (on an x86_64 machine)
an array of 26 byte data structures which contain exported file handles
to the cache files. The position of each structure correlates with the
slot number in the bitmap in kernel memory.
A separate "culling atimes" file is produced which keeps track
exclusively of the last access time for each file, using a 32byte
unsigned integer. The position in this array again correlates with the
slot in the bitmap. On access, these times are updated -- so we avoid
relying on the filesystem access time mechanisms.
Now, in cachefilesd, we can avoid recursively scanning the file
structure of the cachefiles folder and instead begin assembling the
lowest M integers as seen in the culling atimes file. These M atimes can
represent a candidate set of (slot, atime) pairs to cull if/when
requested to do so. Once cachefiles enters its "need to cull" mode, the
cachefilesd service can pluck the top item off the culling queue and
request slot #i be culled, to which the kernel will happily delete the
entry in the bitmap, and mark invalid the entries in cull_index and
cull_atimes, along with the normal culling activities.
== Continuing Issues
The only problem is that this is still kind of slow for large caches.
For the rotational disks I have access to, 16 million cache entries can
take 95-120 msec to assemble the 4K oldest entries as culling
candidates, when the culling index is already warm in the page cache. If
the culling is to be absolutely correct /and/ fast, reading the entire
culling index from disk will need to be avoided. For much larger data
sets, the cost of simply reading the file becomes quite slow on a
standard, commodity hardware.
On this point, I am looking for suggestions on how to avoid reading the
/entire/ cull_atimes file for large data sets to assist in accurate
culling behaviors, OR for suggestions on reasonable approximations to
the problem to cull arbitrary items that are at least "likely to be
among the 25% oldest," as an example.
1. I had considered keeping a third index with some "statistics per
page" for cull_atimes, with things like averages, minimums or
maximums, but keeping track of min/max per page might incur some
large costs if we change a lot of information within a local block
of the cull_atimes file. Averages would help reading the file in an
optimal order, but wouldn't avoid us needing to read the whole
thing. (And there are precision issues...)
2. Keeping the atimes sorted is probably a non-starter, because
rewriting a 64 (16 million atimes) to 4GB (1 billion atimes) is not
an attractive offer.
3. It might be possible to create a third index that keeps track of
roughly what "bin" each slot falls into. e.g, if a cache object has
an atime of somewhere between 0 seconds from cache creation to 4096,
8192, etc it could be indexed to bin "0", the next slice up "1", etc.
In this way, we could at least be certain that even though the first
M numbers we read from disk aren't in order, we know that they are
the lowest (oldest) M numbers, and can perform some fairly quick
sorts on them in memory to perform accurate culling.
The question here, however, becomes: How wide should the bins be?
What is the format on disk? If an item "moves bins" during normal
operation, how long does the "update atime for slot #nnn"
transaction take in the worst and average cases?
I am still relatively new at cache designs, so if anyone has some input
or feedback on methods to quickly identify objects for culling, I am all
ears! Feel free to point out articles, books, anything :]
Thanks,
--John Huston,
RH Westford Newbie
--
Linux-cachefs mailing list
Linux-cachefs@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cachefs