On Tue, 28 Jan 2014, Linus Torvalds wrote: > On Tue, Jan 28, 2014 at 5:01 PM, Mikulas Patocka > <mikulas@xxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > The page cache doesn't handle different-size buffers for one page. > > Correct, but that should not be relevant.. > > > HPFS > > has some 2kB structures (dnodes, bitmaps) and some 512-byte structures > > (fnodes, anodes). We can have a 4kB page that contains one 2kB dnode and > > four 512-byte anodes or fnodes. That is impossible to create with > > create_empty_buffers. > > Damn. You're both right and wrong. > > It's true that buffer heads within a page have to be the same size, > but that's not really relevant - you don't work with pages, so you > could have two totally independent 2kB buffer heads allocated to > within one page. Suppose that 8 consecutive sectors on the disk contain this data: dnode (4 sectors) fnode (1 sector) file content (3 sectors) --- now, you can't access that fnode using 2kB buffer, if you did and if you marked that buffer dirty, you damage file content. So you need different-sized buffers on one page. > And that's actually how filesystems that virtually map pages do things > - they just fill the page with (equal-sized) buffer heads indexed on > the filesystem inode, and the buffer heads don't have to be related to > each other physically on the disk. > > In fact, even the sizes don't even really *have* to be the same (in > theory the list of page buffers could point to five buffers: one 2k > and four 512-byte bhs), but all the helper functions to populate the > buffer head lists etc do assume that. > > And way back when, buffer heads had their own hashed lookup, so even > with the bd_dev approach you could have two non-consecutive > independent 2kB bh's in the same page. > > So you used to be wrong. > > But the reason you're right is that we got rid of the buffer head > hashes, and now use the page-level hashing to look up the page that > the buffer heads are in, which does mean that now you can't really > alias different sizes on different pages any more, or have one page > that contains buffer heads that aren't related to each other > physically on the disk any more. Page-level lookup doesn't seem as a problem to me. All you need to do is to add "blocksize" argument to __find_get_block_slow, change index = block >> (PAGE_CACHE_SHIFT - bd_inode->i_blkbits); to index = block >> (PAGE_CACHE_SHIFT - __ffs(blocksize)); and change else if (bh->b_blocknr == block) to else if (bh->b_blocknr == block && bh->b_size == block) That would be enough to be able to find a buffer with different block size on one page. The worse problem is how to create such buffers. And how to synchronize it with concurrent access from userspace using def_blk_aops, that would be very hard. > So yeah, very annoying, we're *so* close to being able to do this, but > because the buffer heads are really no longer "primary" data > structures and don't have any indexing of their own, we can't actually > do it. > > Linus Mikulas -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html