On Thu, 30 Jan 2014, Linus Torvalds wrote: > On Wed, Jan 29, 2014 at 7:05 AM, Mikulas Patocka > <mikulas@xxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > Suppose that 8 consecutive sectors on the disk contain this data: > > dnode (4 sectors) > > fnode (1 sector) > > file content (3 sectors) > > --- now, you can't access that fnode using 2kB buffer, if you did and if > > you marked that buffer dirty, you damage file content. > > > > So you need different-sized buffers on one page. > > No. You're missing the whole point. > > "consecutive sectors" does not mean "same page". Huh? Let me say it again: we have an 8-sector disk area that is aligned on a page boundary (for example, sector number 24). There is dnode on sectors 24-27 There is fnode on sector 28 There is file content on sectors 29-31 So, I claim this - if you access that fnode using 2k buffer (so the buffer contains not only the fnode, but also the following 3 sectors) and you mark that buffer head dirty, you may cause data corruption on the file. If you disagree with it, say how is it supposed to work. > The problem is that (not *that* long ago, relatively speaking) we have > castrated the buffer cache so much (because almost nobody really uses > it any more) that now it's really a slave of the page cache, and we > got rid of the buffer head hashes entirely. So now we look up the > buffer heads using the page cache, and *that* causes the problems (and > forces us to put those buffer heads in the same page, because we index > by page). > > We can actually still just create such non-consecutive buffers and do > IO on them, we just can't look them up any more. > > Linus Each page has a single-linked circular list of buffers. So you could in theory put buffers of different size on that list. For example, the page for sector 24 (page index 3) could have one buffer with block number 6 and block size 2048 and four buffers with block numbers 28-31 and block size 512. It would be possible to find all those five buffers in __find_get_block_slow if you passed block size as an argument to __find_get_block_slow and if you verified block size when searching the linked list. In theory you could put buffers with all possible combinations of buffer size on that linked list: the page at index 3 could have on its list the following 15 buffers: 8 512-byte buffers with block numbers 24-31 4 1024-byte buffers with block numbers 12-15 2 2048-byte buffers with block numbers 6-7 1 4096-byte buffer with block number 3 Mikulas -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html