Re: [PATCH 1/4] x86/mm/pat: Handle no-GBPAGES case correctly in populate_pud

Ard Biesheuvel <ardb@xxxxxxxxxx> · Wed, 4 Mar 2020 19:44:50 +0100

On Wed, 4 Mar 2020 at 16:49, Arvind Sankar <nivedita@xxxxxxxxxxxx> wrote:
>
> On Wed, Mar 04, 2020 at 09:17:44AM +0100, Ard Biesheuvel wrote:
> > On Tue, 3 Mar 2020 at 21:54, Arvind Sankar <nivedita@xxxxxxxxxxxx> wrote:
> > >
> > > Commit d367cef0a7f0 ("x86/mm/pat: Fix boot crash when 1GB pages are not
> > > supported by the CPU") added checking for CPU support for 1G pages
> > > before using them.
> > >
> > > However, when support is not present, nothing is done to map the
> > > intermediate 1G regions and we go directly to the code that normally
> > > maps the remainder after 1G mappings have been done. This code can only
> > > handle mappings that fit inside a single PUD entry, but there is no
> > > check, and it instead silently produces a corrupted mapping to the end
> > > of the PUD entry, and no mapping beyond it, but still returns success.
> > >
> > > This bug is encountered on EFI machines in mixed mode (32-bit firmware
> > > with 64-bit kernel), with RAM beyond 2G. The EFI support code
> > > direct-maps all the RAM, so a memory range from below 1G to above 2G
> > > triggers the bug and results in no mapping above 2G, and an incorrect
> > > mapping in the 1G-2G range. If the kernel resides in the 1G-2G range, a
> > > firmware call does not return correctly, and if it resides above 2G, we
> > > end up passing addresses that are not mapped in the EFI pagetable.
> > >
> > > Fix this by mapping the 1G regions using 2M pages when 1G page support
> > > is not available.
> > >
> > > Signed-off-by: Arvind Sankar <nivedita@xxxxxxxxxxxx>
> >
> > I was trying to test these patches, and while they seem fine from a
> > regression point of view, I can't seem to reproduce this issue and
> > make it go away again by applying this patch.
> >
> > Do you have any detailed instructions how to reproduce this?
> >
>
> The steps I'm following are
> - build x86_64 defconfig + enable EFI_PGT_DUMP (to show the incorrect
>   pagetable)
> - run (QEMU is 4.2.0)
> $ qemu-system-x86_64 -cpu Haswell -pflash qemu/OVMF_32.fd -m 3072 -nographic \
>   -kernel kernel64/arch/x86/boot/bzImage -append "earlyprintk=ttyS0,keep efi=debug nokaslr"
>
> The EFI memory map I get is (abbreviated to regions of interest):
> ...
> [    0.253991] efi: mem10: [Conventional Memory|   |  |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x00000000053e7000-0x000000003fffbfff] (940MB)
> [    0.254424] efi: mem11: [Loader Data        |   |  |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000003fffc000-0x000000003fffffff] (0MB)
> [    0.254991] efi: mem12: [Conventional Memory|   |  |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000040000000-0x00000000bbf77fff] (1983MB)
> ...
>
> The pagetable this produces is (abbreviated again):
> ...
> [    0.272980] 0x0000000003400000-0x0000000004800000          20M     ro         PSE         x  pmd
> [    0.273327] 0x0000000004800000-0x0000000005200000          10M     RW         PSE         NX pmd
> [    0.273987] 0x0000000005200000-0x0000000005400000           2M     RW                     NX pte
> [    0.274343] 0x0000000005400000-0x000000003fe00000         938M     RW         PSE         NX pmd
> [    0.274725] 0x000000003fe00000-0x0000000040000000           2M     RW                     NX pte
> [    0.275066] 0x0000000040000000-0x0000000080000000           1G     RW         PSE         NX pmd
> [    0.275437] 0x0000000080000000-0x00000000bbe00000         958M                               pmd
> ...
>
> Note how 0x80000000-0xbbe00000 range is unmapped in the resulting
> pagetable. The dump doesn't show physical addresses, but the
> 0x40000000-0x80000000 range is incorrectly mapped as well, as the loop
> in populate_pmd would just go over that virtual address range twice.
>
>         while (end - start >= PMD_SIZE) {
>                 ...
>                 pmd = pmd_offset(pud, start);
>
>                 set_pmd(pmd, pmd_mkhuge(pfn_pmd(cpa->pfn,
>                                         canon_pgprot(pmd_pgprot))));
>
>                 start     += PMD_SIZE;
>                 cpa->pfn  += PMD_SIZE >> PAGE_SHIFT;
>                 cur_pages += PMD_SIZE >> PAGE_SHIFT;
>         }

I've tried a couple of different ways, but I can't seem to get my
memory map organized in the way that will trigger the error.