On 9/29/20 8:14 PM, Matthew Wilcox wrote: > On Tue, Sep 29, 2020 at 01:26:29PM -0400, John David Anglin wrote: >> On 2020-09-29 1:01 p.m., Matthew Wilcox wrote: >>> On Tue, Sep 29, 2020 at 04:33:16PM +0100, Matthew Wilcox wrote: >>>> I think we can end up truncating a PMD or PGD entry (I get confused >>>> easily about levels of the page tables; bear with me) >>>> >>>> /* NOTE: even on 64 bits, these entries are __u32 because we allocate >>>> * the pmd and pgd in ZONE_DMA (i.e. under 4GB) */ >>>> typedef struct { __u32 pgd; } pgd_t; >>>> ... >>>> typedef struct { __u32 pmd; } pmd_t; >>>> >>>> ... >>>> >>>> pgd_t *pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, >>>> PGD_ALLOC_ORDER); >>>> ... >>>> return (pmd_t *)__get_free_pages(GFP_PGTABLE_KERNEL, PMD_ORDER); >>>> >>>> so if we have more than 2GB of RAM, we can allocate a page with the top >>>> bit set, which we interpret to mean PAGE_PRESENT in the TLB miss handler >>>> and mask it off, causing us to load the wrong page for the next level >>>> of the page table walk. >>>> >>>> Have I missed something? >>> Yes, yes I have. >>> >>> We store the PFN, not the physical address. So we have 28 bits for >>> storing the PFN and 4 bits for the PxD bits, supporting 28 + 12 = 40 bits >>> (1TB) of physical address space. >> The comment in pgalloc.h says 8TB? I think improving the description as to how this works >> would be welcome. > > It's talking about 8TB of virtual address space. But I think it's wrong. > On 64-bit, > > Each PTE defines a 4kB region of address space (ie one page). > Each PMD is a 4kB allocation with 8-byte entries, so covers 512 * 4kB = 2MB No, PMD is 4kb allocation with 4-byte entries, so covers 1024 * 4kb = 4MB We always us 4-byte entries, for 32- and 64-bit kernels. > Each PGD is an 8kB allocation with 4-byte entries, so covers 2048 * 2M = 4GB No. each PGD is a 4kb allocation with 4-byte entries. so covers 1024 * 4MB = 4GB Still, my calculation ends up with 4GB, like yours. > The top-level allocation is a 32kB allocation, but the first 8kB is used > for the first PGD, so it covers 24kB / 4 bytes * 4GB = 24TB. size of PGD (swapper_pg_dir) is 8k, so we have 8k / 4 bytes * 4GB = 8 TB virtual address space. At boot we want to map (1 << KERNEL_INITIAL_ORDER) pages (=64MB on 64bit kernel) and for this pmd0 gets pre-allocated with 8k size, and pg0 with 132k to simplify the filling the initial page tables - but that's not relevant for the calculations above. > I think the top level allocation was supposed to be an order-2 allocation, > which would be an 8TB address space, but it's order-3. > > There's a lot of commentary which disagrees with the code. For example, > > #define PMD_ORDER 1 /* Number of pages per pmd */ > That's just not true; an order-1 allocation is 2 pages, not 1. Yes, that should be fixed up. Helge