On Wed, Jan 29, 2014 at 7:05 AM, Mikulas Patocka <mikulas@xxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > Suppose that 8 consecutive sectors on the disk contain this data: > dnode (4 sectors) > fnode (1 sector) > file content (3 sectors) > --- now, you can't access that fnode using 2kB buffer, if you did and if > you marked that buffer dirty, you damage file content. > > So you need different-sized buffers on one page. No. You're missing the whole point. "consecutive sectors" does not mean "same page". The page cache doesn't care. It never has. Non-consecutive sectors is common for normal file mappings. The *buffer* cache doesn't really care either, and in fact that non-consecutive case used to be the common one (very much even for raw disk accesses, exactly because things *used* to be coherent with a mounted filesystem - so if there were files that had populated part of the buffer cache with their non-consecutive sectors, the raw disk access would just use those non-consecutive sectors). And all that worked because we'd just look up the buffer head in the hashes. The page it was on didn't matter. The problem is that (not *that* long ago, relatively speaking) we have castrated the buffer cache so much (because almost nobody really uses it any more) that now it's really a slave of the page cache, and we got rid of the buffer head hashes entirely. So now we look up the buffer heads using the page cache, and *that* causes the problems (and forces us to put those buffer heads in the same page, because we index by page). We can actually still just create such non-consecutive buffers and do IO on them, we just can't look them up any more. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html