Re: Sysfs-Configurable readahead and background bypasses

Nix <nix@xxxxxxxxxxxxx> · Sun, 17 Feb 2019 00:23:53 +0000

On 16 Feb 2019, Coly Li told this:

> The reason we care about metadata here is, for some file systems, they
> do metadata readahead as sequential requests, and we want to keep such
> sequential metadata I/Os on cache device.

... and something is still not quite right here. I just did a git status
on the usual evil test case, a Chromium git repo on XFS-on-bcache-on-md.
I've done a complete backup indexing run before and 10GiB or so of
metadata has hit the cache device, yet the git status still caused it to
pound away at the disk for fifteen minutes or so, very seekily, with
bypassed I/O going up and nothing much happening to the cache hits *or*
cache misses.

(I have boosted the sequential_cutoff to 6144K on the grounds that, with
my RAID chunk size of 512K on three disks of a 5-disk RAID-6 and
sequential read rate of 200MiB/s/disk, it's only once you pass about
6144K that the time taken to read exceeds the typical seek time of about
7--10ms. A bit more stuff is getting cached, but not... *whatever* git
is doing here.)

I'll do a drop_caches soon and try again, and examine what's going on
with blktrace, because something strange is happening here I think.

Hm actually it looks like "git status" reads the first line of every
file as well, which obviously a backup index run is not going to do
(that just stat()s everything). It's still not clear to me why *that*
was being bypassed though. Reading a few hundred bytes from each of
tens of thousands of files seems like exactly the sort of thing bcache
should be caching... more analysis needed, I think. Let's see, can I get
someone to give me a research grant :P

> For normal file readahead, if it is sequential I/O and execeeds
> sequential cutoff threshold, bcache won't have it. But if it is random,
> bcache may have it. It is about I/O patterns, not priorities.

Unless you're using the ioprio patch in which case that matters too ;)
(different sort of priority though.)

> Your patch introduces cache policy with different I/O priorities
> (background or readahead), which may have overlap with I/O patterns,
> e.g. a background & random I/O. For such I/O, should we can it or not ?

As with scheduling, answering this correctly requires precognition,
because we really want to know whether the data will be blocked on by
processes the user is waiting on in the future. But if we had
precognition, we wouldn't need bcache at all but could just readahead
*everything* with perfect accuracy.

(And you could do a lot more too! See
<https://www.chiark.greenend.org.uk/~sgtatham/infinity.html>. The
closest real-world analogue of this we have is the way quantum computers
can carry out some classes of computations *even though the machine is
turned off*. As with much in this area this is so hard to observe that
it's not much immediate practical use...)