Re: [PATCH 2/2] hpfs: optimize quad buffer loading

Mikulas Patocka <mikulas@xxxxxxxxxxxxxxxxxxxxxxxx> · Wed, 29 Jan 2014 02:01:39 +0100 (CET)

On Tue, 28 Jan 2014, Linus Torvalds wrote:

> On Tue, Jan 28, 2014 at 3:11 PM, Mikulas Patocka
> <mikulas@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > HPFS needs to load 4 consecutive 512-byte sectors when accessing the
> > directory nodes or bitmaps. We can't switch to 2048-byte block size
> > because files are allocated in the units of 512-byte sectors.
> 
> Bah, this is untrue.
> 
> Well, it's true that you cannot *switch* to another size, but the
> buffer head layer should be perfectly happy with mixed sizes within a
> device, even if nobody happens to do it.
> 
> Just allocate a whole page, and make *that* page use 2048-byte buffers.
> 
> So you should be perfectly able to just do
> 
>   struct buffer_head *bh = __bread(dev, nr, 2048);
> 
> which gets and reads a single 2048-byte buffer head.
>
> Now, the problem is that because nobody actually does this, I bet we
> have bugs in this area, and some path ends up using
> bd_inode->i_blkbits instead of the passed-in size. A very quick look
> finds __find_get_block -> __find_get_block_slow() looking bad, for
> example.
> 
> But I also bet that that should be easy to fix. In fact, I think the
> only reason we use "i_blkbits" there is because it avoids a division
> (and nobody had a *reason* to do it), but since this is the "we have
> to do IO" path, just passing in the size and then using a
> "sector_div()" is a nobrainer from a performance standpoint, I think.
> So fixing that problem looks like a couple of lines.
> 
> Now, another issue is that with multiple block sizes, it's up to the
> filesystem to then guarantee that there isn't aliasing between two
> physical blocks (eg say "2048b sized block at offset 1" vs "512b
> buffer-head at offset 4"). But if the aliasing is fairly naturally
> avoided at the FS level (and if this is done only for particular parts
> of the filesystem, that should be very natural), that shouldn't be a
> problem either.
> 
> So I'd actually much rather see us taking advantage of multiple buffer
> sizes and use a *native* 2k buffer-head when it makes sense, than this
> odd "let's allocate them, and then maybe they are all properly aligned
> anyway" kind of voodoo programming.
> 
> Would you be willing to try?
> 
>               Linus
> 

The page cache doesn't handle different-size buffers for one page. HPFS 
has some 2kB structures (dnodes, bitmaps) and some 512-byte structures 
(fnodes, anodes). We can have a 4kB page that contains one 2kB dnode and 
four 512-byte anodes or fnodes. That is impossible to create with 
create_empty_buffers.

(you can't access 512-byte structures using 2kB buffers because that could 
trash file data that lie in the same 2kB area)

__find_get_block_slow walks all the buffers on the page and finds the one 
that matches, that is fixable. But I have no clue how to replace 
create_empty_buffers. How much other code does depend on the fact that 
buffers completely cover a page and have the same size?

There is a general problem - HPFS has very small user base, no one is 
really stress-testing it. I'm not stress-testing it neither - I have some 
data on HPFS, but I'm not running any intensive parallel workload on it. 

So even if we hacked the page cache to support different-size buffers on a 
single page, there would likely be bugs in it and no one would found those 
bugs for a long time because of tiny HPFS user base. I think it would be 
better to try this approach on some widely used filesystem and when it is 
debugged, I can switch HPFS for that.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html