Re: [PATCH v2 09/10] arm64/efi: ignore unusable regions instead of reserving them

Mark Rutland <mark.rutland@xxxxxxx> · Tue, 11 Nov 2014 18:39:26 +0000

On Tue, Nov 11, 2014 at 05:55:24PM +0000, Ard Biesheuvel wrote:
> On 11 November 2014 18:44, Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > On Tue, Nov 11, 2014 at 05:12:09PM +0000, Mark Salter wrote:
> >> On Tue, 2014-11-11 at 10:42 -0500, Mark Salter wrote:
> >> > On Mon, 2014-11-10 at 08:31 +0100, Ard Biesheuvel wrote:
> >> > > On 10 November 2014 05:11, Mark Salter <msalter@xxxxxxxxxx> wrote:
> >> > > > On Thu, 2014-11-06 at 15:13 +0100, Ard Biesheuvel wrote:
> >> > > >> This changes the way memblocks are installed based on the contents
> >> > > of
> >> > > >> the UEFI memory map. Formerly, all regions would be
> >> > > memblock_add()'ed,
> >> > > >> after which unusable regions would be memblock_reserve()'d as well.
> >> > > >> To simplify things, but also to allow access to the unusable
> >> > > regions
> >> > > >> through mmap(/dev/mem), even with CONFIG_STRICT_DEVMEM set, change
> >> > > >> this so that only usable regions are memblock_add()'ed in the first
> >> > > >> place.
> >> > > >
> >> > > > This patch is crashing 64K pagesize kernels during boot. I'm not
> >> > > exactly
> >> > > > sure why, though. Here is what I get on an APM Mustang box:
> >> > > >
> >> > >
> >> > > Ah, yes, I meant to mention this patch
> >> > >
> >> > > https://git.kernel.org/cgit/linux/kernel/git/glikely/linux.git/commit/?id=8cccffc52694938fc88f3d90bc7fed8460e27191
> >> > >
> >> > > in the cover letter, which addresses this issue at least for the DT
> >> > > case.
> >> > >
> >> >
> >> > That isn't the problem. In general, with 64K kernel pages, you can't be
> >> > sure if you leave something you need out of the kernel linear mapping.
> >
> > Regardless of 64k pages you can never assume that anything will be in
> > the linear mapping due to the current relationship between the start of
> > the linear map and the address the kernel was loaded at.
> >
> >> > If you have Loader Code/Data regions begin and/or end at something other
> >> > than a 64K boundary and that region is adjacent to a region not being
> >> > added, then you end up leaving out the unaligned bits from the linear
> >> > mapping. This could be bits of the initramfs or devicetree.
> >> >
> >> > What I don't get with this failure is that it is an alignment fault
> >> > which should be masked at EL1 for the kernel. The same unaligned
> >> > access happens without this patch and it doesn't generate a fault.
> >> >
> >>
> >> Ah, but unaligned accesses are not ignored for device memory.
> >> I have this in include/acpi/acpi_io.h:
> >>
> >> static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
> >>                                           acpi_size size)
> >> {
> >> #ifdef CONFIG_ARM64
> >>       if (!page_is_ram(phys >> PAGE_SHIFT))
> >>               return ioremap(phys, size);
> >> #endif
> >>
> >>        return ioremap_cache(phys, size);
> >> }
> >>
> >> Because the table isn't in the linear mapping, it fails the
> >> page_is_ram() test and it gits mapped with ioremap() leading to
> >> the alignment fault.
> >>
> >> If I take out the code inside the #ifdef, I get a different
> >> fault:
> >>
> >> [ 0.350057] Unhandled fault: synchronous external abort (0x96000010) at 0xfffffe0000fae6f4
> >> [    0.358704] pgd = fffffe0001160000
> >> [    0.362276] [fffffe0000fae6f4] *pgd=0000004001370003, *pud=0000004001370003, *pmd=0000004001370003, *pte=02c00040011a0713
> >> [    0.373746] Internal error: : 96000010 [#1] SMP
> >> [    0.378484] Modules linked in:
> >> [    0.381601] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc4+ #15
> >> [    0.388248] Hardware name: APM X-Gene Mustang board (DT)
> >> [    0.393738] task: fffffe03dbe10000 ti: fffffe03dbf00000 task.ti: fffffe03dbf00000
> >> [    0.401503] PC is at acpi_ex_system_memory_space_handler+0x238/0x2e0
> >> [    0.408160] LR is at acpi_ex_system_memory_space_handler+0x130/0x2e0
> >>
> >> That happens because AML is trying to access a hardware register
> >> which has been mapped as normal memory.
> >
> > Which is why removing the check in the ifdef is completely nonsensical.
> 
> I am sure we are all in agreement on this part.
> 
> > We already knew we can't map everything cacheable -- regardless of what
> > AML does the CPU can prefetch from anything mapped cacheable (or simply
> > executable) at any time.
> >
> >> So, we need a way to tell a table in ram from an io address in AML.
> >> And page_is_ram() no longer cuts it if the tables are not in the
> >> linear mapping.
> >
> > If the kernel were loaded at an address above the tables, it would fail
> > similarly. So the page_is_ram was never sufficient to ensure tables
> > would be mapped cacheable.
> >
> > As we haven't decoupled the kernel text mapping from the linear mapping,
> > and that doesn't look to be happening any time soon, we can't fix up
> > page_is_ram -- we need something else entirely.
> >
> 
> Well, if you look at the series, and in particular at the /dev/mem
> handling, there is already some code that classifies physical
> addresses based on whether they appear in the UEFI memory map, and
> with which attributes. I suppose reimplementing page_is_ram() [whose
> default implementation is conveniently __weak] to return true for
> EFI_MEMORY_WB ranges and false for everything else would do the trick
> here, and would arguably be more elegant than matching the string
> "System RAM" in the resource table. And, putting the UEFI memory map
> central in a range of RAM remapping related functions ensures that
> they are always in agreement (or so the theory goes)

Sure. We'll have to be careful to fall back to the current behaviour for
!UEFI systems, though.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html