On 01/31/2014 03:27 PM, James Bottomley wrote: > On Fri, 2014-01-31 at 13:47 -0800, Dave Hansen wrote: >> On 01/31/2014 11:02 AM, James Bottomley wrote: >>> 3. Increase pgoff_t and the radix tree indexes to u64 for >>> CONFIG_LBDAF. This will blow out the size of struct page on 32 >>> bits by 4 bytes and may have other knock on effects, but at >>> least it will be transparent. >> >> I'm not sure how many acrobatics we want to go through for 32-bit, but... > > That's partly the question: 32 bits was dying in the x86 space (at least > until quark), but it's still predominant in embedded. > >> Between page->mapping and page->index, we have 64 bits of space, which >> *should* be plenty to uniquely identify a block. We could easily add a >> second-level lookup somewhere so that we store some cookie for the >> address_space instead of a direct pointer. How many devices would need, >> practically? 8 bits worth? > > That might work. 8 bits would get us up to 4PB, which is looking a bit > high for single disk spinning rust. However, how would the cookie work > efficiently? remember we'll be doing this lookup every time we pull a > page out of the page cache. And the problem is that most of our lookups > will be on file inodes, which won't be > 16TB, so it's a lot of overhead > in the generic machinery for a problem that only occurs on buffer > related page cache lookups. I think all we have to do is set a low bit in page->mapping (or in page->flags, but its more constrained) to say: "this isn't a direct pointer". We only set the bit for the buffer cache pages, and thus only go to the slow(er) lookup path for those. Whatever we use for the lookups (radix tree or whatever) uses the remaining bits for an index. We'd probably also need a last-lookup cache like mm->mmap_cache, but probably not much more than that. We already have page_mapping() in place to redirect folks away from using page->mapping directly, so there shouldn't be too much code impact. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html