Improving cachefilesd culling speed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



== Background

In the event of a nearly full disk or the consumption of most of our available inodes, cachefiles will instruct the daemon to begin culling files in the order of oldest-to-newest in order to free up resources. This can be quite slow for any appreciably large cache.

== First-pass improvement

David Howells produced a time ago part of a patch to improve this scheme by introducing into cachefiles a series of page-sized bitmaps which associated an ordinal index or slot number to each fscache object. In this scheme, each cachefiles file representation of a cache object is given an xattribute which indicates its slot number in the bitmap.

A "culling index" file is produced which contains (on an x86_64 machine) an array of 26 byte data structures which contain exported file handles to the cache files. The position of each structure correlates with the slot number in the bitmap in kernel memory.

A separate "culling atimes" file is produced which keeps track exclusively of the last access time for each file, using a 32byte unsigned integer. The position in this array again correlates with the slot in the bitmap. On access, these times are updated -- so we avoid relying on the filesystem access time mechanisms.

Now, in cachefilesd, we can avoid recursively scanning the file structure of the cachefiles folder and instead begin assembling the lowest M integers as seen in the culling atimes file. These M atimes can represent a candidate set of (slot, atime) pairs to cull if/when requested to do so. Once cachefiles enters its "need to cull" mode, the cachefilesd service can pluck the top item off the culling queue and request slot #i be culled, to which the kernel will happily delete the entry in the bitmap, and mark invalid the entries in cull_index and cull_atimes, along with the normal culling activities.

== Continuing Issues

The only problem is that this is still kind of slow for large caches. For the rotational disks I have access to, 16 million cache entries can take 95-120 msec to assemble the 4K oldest entries as culling candidates, when the culling index is already warm in the page cache. If the culling is to be absolutely correct /and/ fast, reading the entire culling index from disk will need to be avoided. For much larger data sets, the cost of simply reading the file becomes quite slow on a standard, commodity hardware.

On this point, I am looking for suggestions on how to avoid reading the /entire/ cull_atimes file for large data sets to assist in accurate culling behaviors, OR for suggestions on reasonable approximations to the problem to cull arbitrary items that are at least "likely to be among the 25% oldest," as an example.

1. I had considered keeping a third index with some "statistics per
   page" for cull_atimes, with things like averages, minimums or
   maximums, but keeping track of min/max per page might incur some
   large costs if we change a lot of information within a local block
   of the cull_atimes file. Averages would help reading the file in an
   optimal order, but wouldn't avoid us needing to read the whole
   thing. (And there are precision issues...)

2. Keeping the atimes sorted is probably a non-starter, because
   rewriting a 64 (16 million atimes) to 4GB (1 billion atimes) is not
   an attractive offer.

3. It might be possible to create a third index that keeps track of
   roughly what "bin" each slot falls into. e.g, if a cache object has
   an atime of somewhere between 0 seconds from cache creation to 4096,
   8192, etc it could be indexed to bin "0", the next slice up "1", etc.
   In this way, we could at least be certain that even though the first
   M numbers we read from disk aren't in order, we know that they are
   the lowest (oldest) M numbers, and can perform some fairly quick
   sorts on them in memory to perform accurate culling.

   The question here, however, becomes: How wide should the bins be?
   What is the format on disk? If an item "moves bins" during normal
   operation, how long does the "update atime for slot #nnn"
   transaction take in the worst and average cases?

I am still relatively new at cache designs, so if anyone has some input or feedback on methods to quickly identify objects for culling, I am all ears! Feel free to point out articles, books, anything :]


Thanks,
--John Huston,
RH Westford Newbie

--
Linux-cachefs mailing list
Linux-cachefs@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cachefs




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]
  Powered by Linux