Re: Page tables on machines with >2GB RAM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/29/20 8:14 PM, Matthew Wilcox wrote:
> On Tue, Sep 29, 2020 at 01:26:29PM -0400, John David Anglin wrote:
>> On 2020-09-29 1:01 p.m., Matthew Wilcox wrote:
>>> On Tue, Sep 29, 2020 at 04:33:16PM +0100, Matthew Wilcox wrote:
>>>> I think we can end up truncating a PMD or PGD entry (I get confused
>>>> easily about levels of the page tables; bear with me)
>>>>
>>>> /* NOTE: even on 64 bits, these entries are __u32 because we allocate
>>>>  * the pmd and pgd in ZONE_DMA (i.e. under 4GB) */
>>>> typedef struct { __u32 pgd; } pgd_t;
>>>> ...
>>>> typedef struct { __u32 pmd; } pmd_t;
>>>>
>>>> ...
>>>>
>>>>         pgd_t *pgd = (pgd_t *)__get_free_pages(GFP_KERNEL,
>>>>                                                PGD_ALLOC_ORDER);
>>>> ...
>>>>         return (pmd_t *)__get_free_pages(GFP_PGTABLE_KERNEL, PMD_ORDER);
>>>>
>>>> so if we have more than 2GB of RAM, we can allocate a page with the top
>>>> bit set, which we interpret to mean PAGE_PRESENT in the TLB miss handler
>>>> and mask it off, causing us to load the wrong page for the next level
>>>> of the page table walk.
>>>>
>>>> Have I missed something?
>>> Yes, yes I have.
>>>
>>> We store the PFN, not the physical address.  So we have 28 bits for
>>> storing the PFN and 4 bits for the PxD bits, supporting 28 + 12 = 40 bits
>>> (1TB) of physical address space.
>> The comment in pgalloc.h says 8TB?  I think improving the description as to how this works
>> would be welcome.
>
> It's talking about 8TB of virtual address space.  But I think it's wrong.
> On 64-bit,
>
> Each PTE defines a 4kB region of address space (ie one page).
> Each PMD is a 4kB allocation with 8-byte entries, so covers 512 * 4kB = 2MB

No, PMD is 4kb allocation with 4-byte entries, so covers 1024 * 4kb = 4MB
We always us 4-byte entries, for 32- and 64-bit kernels.

> Each PGD is an 8kB allocation with 4-byte entries, so covers 2048 * 2M = 4GB

No. each PGD is a 4kb allocation with 4-byte entries. so covers 1024 * 4MB = 4GB
Still, my calculation ends up with 4GB, like yours.

> The top-level allocation is a 32kB allocation, but the first 8kB is used
> for the first PGD, so it covers 24kB / 4 bytes * 4GB = 24TB.

size of PGD (swapper_pg_dir) is 8k, so we have 8k / 4 bytes * 4GB = 8 TB
virtual address space.

At boot we want to map (1 << KERNEL_INITIAL_ORDER) pages (=64MB on 64bit kernel)
and for this pmd0 gets pre-allocated with 8k size, and pg0 with 132k to
simplify the filling the initial page tables - but that's not relevant for
the calculations above.

> I think the top level allocation was supposed to be an order-2 allocation,
> which would be an 8TB address space, but it's order-3.
>
> There's a lot of commentary which disagrees with the code.  For example,
>
> #define PMD_ORDER       1 /* Number of pages per pmd */
> That's just not true; an order-1 allocation is 2 pages, not 1.

Yes, that should be fixed up.

Helge




[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux