On Fri, 31 Jan 2014, Linus Torvalds wrote: > On Fri, Jan 31, 2014 at 10:10 AM, Mikulas Patocka > <mikulas@xxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > Buffer cache is backed by pages from page cache. If we have page size 4k, > > page with index 0 maps sectors 0-7 [..] > > Not at all necessarily. > > One page might contain sectors 761, 51, 900 and 12-16. The buffer > heads have sector numbers that are *independent* of the page they are > in. > > Christ, just read the email. Stop this "it has to be consecutive > sectors". Because it really doesn't. It really *isn't* (for file > backed pages). I understand that. Sure, the mapping is non-consecutive for file-based pages. But the pages that are used for sb_bread, sb_getblk and related functions map consecutive area on the disk. So - if we are talking in this thread about implementation of sb_bread, we can say that the pages used for sb_bread map consecutive area. > The issue we have these days is that we actually dropped our buffer > cache hash chains, and buffer heads aren't actually independently > indexed any more. So now rely entirely on the page cache index. So > *lookup* right now depends on one page containing sectors that are > "related" (not necessarily physically on disk, though), but that's a > small implementation detail and isn't even historically true. > > Now, it may well not be worth re-introducing the buffer head hash > lists. I'm not saying we should do that. Your ugly patch may be the > smaller pain, because in the end, few enough filesystems actually want > different sector sizes. So I'm really arguing to explain that the > whole "sectors have to be consecutive in a page" is BS. Yes, it may be better to apply the patch than to redesign the buffer cache for different-sized buffer. > You seem to be somewhat confused about the buffer cache usage, since > you also thought that we don't alias filesystem data and direct block > device data, We really really do. The same physical sectors can exist > in both - in different pages, and not coherent with each other. I understand it - that's why I said that you can't access on-disk structures on HPFS using a buffer that is larger than the structure itself - because it may alias a file and (as you correctly say) there is no coherency. Mikulas -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html