On Fri, Jan 31, 2014 at 10:10 AM, Mikulas Patocka <mikulas@xxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > Buffer cache is backed by pages from page cache. If we have page size 4k, > page with index 0 maps sectors 0-7 [..] Not at all necessarily. One page might contain sectors 761, 51, 900 and 12-16. The buffer heads have sector numbers that are *independent* of the page they are in. And we *use* that. Every single day. It's how the virtual file mapping is done. The buffer cache still supports it, and it still works. The buffer cache also technically supports mixing sizes in the same page (and it still does *not* have to be about _consecutive_ sectors!), but I won't actually say that it works, because nobody has ever actually used that capability. And I explained how we used to do that EVEN FOR DIRECT BLOCK IO (and how we had a bh hash chain for lookups). Christ, just read the email. Stop this "it has to be consecutive sectors". Because it really doesn't. It really *isn't* (for file backed pages). The issue we have these days is that we actually dropped our buffer cache hash chains, and buffer heads aren't actually independently indexed any more. So now rely entirely on the page cache index. So *lookup* right now depends on one page containing sectors that are "related" (not necessarily physically on disk, though), but that's a small implementation detail and isn't even historically true. Now, it may well not be worth re-introducing the buffer head hash lists. I'm not saying we should do that. Your ugly patch may be the smaller pain, because in the end, few enough filesystems actually want different sector sizes. So I'm really arguing to explain that the whole "sectors have to be consecutive in a page" is BS. You seem to be somewhat confused about the buffer cache usage, since you also thought that we don't alias filesystem data and direct block device data, We really really do. The same physical sectors can exist in both - in different pages, and not coherent with each other. The buffer cache is actually quite flexible. It's certainly not perfect, and some filesystems have been migrating away from it due to overheads (the bh allocations, for example, and many modern filesystems like doing their IO directly using the bio interface because it's closer to the disk, and once you do your caching in the page cache yourself, the fact that buffer heads exist over more than just the IO can be more of a pain than a gain), but it's actually *designed* to do all this. The "how to index it" is actually a fairly well separated issue from the buffer cache. You can actually use the buffer heads without ever really indexing them at all (and in many respects, that's how the page cache uses them) and see them as just an IO entity. That was actually a historical usage, but these days people would use a bio for that case. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html