Re: Page tables on machines with >2GB RAM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 29, 2020 at 01:26:29PM -0400, John David Anglin wrote:
> On 2020-09-29 1:01 p.m., Matthew Wilcox wrote:
> > On Tue, Sep 29, 2020 at 04:33:16PM +0100, Matthew Wilcox wrote:
> >> I think we can end up truncating a PMD or PGD entry (I get confused
> >> easily about levels of the page tables; bear with me)
> >>
> >> /* NOTE: even on 64 bits, these entries are __u32 because we allocate
> >>  * the pmd and pgd in ZONE_DMA (i.e. under 4GB) */
> >> typedef struct { __u32 pgd; } pgd_t;
> >> ...
> >> typedef struct { __u32 pmd; } pmd_t;
> >>
> >> ...
> >>
> >>         pgd_t *pgd = (pgd_t *)__get_free_pages(GFP_KERNEL,
> >>                                                PGD_ALLOC_ORDER);
> >> ...
> >>         return (pmd_t *)__get_free_pages(GFP_PGTABLE_KERNEL, PMD_ORDER);
> >>
> >> so if we have more than 2GB of RAM, we can allocate a page with the top
> >> bit set, which we interpret to mean PAGE_PRESENT in the TLB miss handler
> >> and mask it off, causing us to load the wrong page for the next level
> >> of the page table walk.
> >>
> >> Have I missed something?
> > Yes, yes I have.
> >
> > We store the PFN, not the physical address.  So we have 28 bits for
> > storing the PFN and 4 bits for the PxD bits, supporting 28 + 12 = 40 bits
> > (1TB) of physical address space.
> The comment in pgalloc.h says 8TB?  I think improving the description as to how this works
> would be welcome.

It's talking about 8TB of virtual address space.  But I think it's wrong.
On 64-bit,

Each PTE defines a 4kB region of address space (ie one page).
Each PMD is a 4kB allocation with 8-byte entries, so covers 512 * 4kB = 2MB
Each PGD is an 8kB allocation with 4-byte entries, so covers 2048 * 2M = 4GB
The top-level allocation is a 32kB allocation, but the first 8kB is used
for the first PGD, so it covers 24kB / 4 bytes * 4GB = 24TB.

I think the top level allocation was supposed to be an order-2 allocation,
which would be an 8TB address space, but it's order-3.

There's a lot of commentary which disagrees with the code.  For example,

#define PMD_ORDER       1 /* Number of pages per pmd */

That's just not true; an order-1 allocation is 2 pages, not 1.



[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux