Re: [LSF/MM TOPIC] Fixing large block devices on 32 bit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/31/2014 03:27 PM, James Bottomley wrote:
> On Fri, 2014-01-31 at 13:47 -0800, Dave Hansen wrote:
>> On 01/31/2014 11:02 AM, James Bottomley wrote:
>>>      3. Increase pgoff_t and the radix tree indexes to u64 for
>>>         CONFIG_LBDAF.  This will blow out the size of struct page on 32
>>>         bits by 4 bytes and may have other knock on effects, but at
>>>         least it will be transparent.
>>
>> I'm not sure how many acrobatics we want to go through for 32-bit, but...
> 
> That's partly the question: 32 bits was dying in the x86 space (at least
> until quark), but it's still predominant in embedded.
> 
>> Between page->mapping and page->index, we have 64 bits of space, which
>> *should* be plenty to uniquely identify a block.  We could easily add a
>> second-level lookup somewhere so that we store some cookie for the
>> address_space instead of a direct pointer.  How many devices would need,
>> practically?  8 bits worth?
> 
> That might work.  8 bits would get us up to 4PB, which is looking a bit
> high for single disk spinning rust.  However, how would the cookie work
> efficiently? remember we'll be doing this lookup every time we pull a
> page out of the page cache.  And the problem is that most of our lookups
> will be on file inodes, which won't be > 16TB, so it's a lot of overhead
> in the generic machinery for a problem that only occurs on buffer
> related page cache lookups.

I think all we have to do is set a low bit in page->mapping (or in
page->flags, but its more constrained) to say: "this isn't a direct
pointer".  We only set the bit for the buffer cache pages, and thus only
go to the slow(er) lookup path for those.  Whatever we use for the
lookups (radix tree or whatever) uses the remaining bits for an index.
We'd probably also need a last-lookup cache like mm->mmap_cache, but
probably not much more than that.

We already have page_mapping() in place to redirect folks away from
using page->mapping directly, so there shouldn't be too much code impact.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux