Re: [PATCH 2/2] hpfs: optimize quad buffer loading

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 31 Jan 2014 10:40:54 -0800

On Fri, Jan 31, 2014 at 10:10 AM, Mikulas Patocka
<mikulas@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Buffer cache is backed by pages from page cache. If we have page size 4k,
> page with index 0 maps sectors 0-7 [..]

Not at all necessarily.

One page might contain sectors 761, 51, 900 and 12-16. The buffer
heads have sector numbers that are *independent* of the page they are
in.

And we *use* that. Every single day. It's how the virtual file mapping
is done. The buffer cache still supports it, and it still works. The
buffer cache also technically supports mixing sizes in the same page
(and it still does *not* have to be about _consecutive_ sectors!), but
I won't actually say that it works, because nobody has ever actually
used that capability.

And I explained how we used to do that EVEN FOR DIRECT BLOCK IO (and
how we had a bh hash chain for lookups).

Christ, just read the email. Stop this "it has to be consecutive
sectors". Because it really doesn't. It really *isn't* (for file
backed pages).

The issue we have these days is that we actually dropped our buffer
cache hash chains, and buffer heads aren't actually independently
indexed any more. So now rely entirely on the page cache index. So
*lookup* right now depends on one page containing sectors that are
"related" (not necessarily physically on disk, though), but that's a
small implementation detail and isn't even historically true.

Now, it may well not be worth re-introducing the buffer head hash
lists. I'm not saying we should do that. Your ugly patch may be the
smaller pain, because in the end, few enough filesystems actually want
different sector sizes. So I'm really arguing to explain that the
whole "sectors have to be consecutive in a page" is BS.

You seem to be somewhat confused about the buffer cache usage, since
you also thought that we don't alias filesystem data and direct block
device data, We really really do. The same physical sectors can exist
in both - in different pages, and not coherent with each other.

The buffer cache is actually quite flexible. It's certainly not
perfect, and some filesystems have been migrating away from it due to
overheads (the bh allocations, for example, and many modern
filesystems like doing their IO directly using the bio interface
because it's closer to the disk, and once you do your caching in the
page cache yourself, the fact that buffer heads exist over more than
just the IO can be more of a pain than a gain), but it's actually
*designed* to do all this.

The "how to index it" is actually a fairly well separated issue from
the buffer cache. You can actually use the buffer heads without ever
really indexing them at all (and in many respects, that's how the page
cache uses them) and see them as just an IO entity. That was actually
a historical usage, but these days people would use a bio for that
case.

             Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html