Re: [PATCH V2 0/3] Fix EFI memory map leaks

Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> · Fri, 21 Dec 2018 17:58:17 +0100

On Wed, 5 Dec 2018 at 17:21, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
>
> * Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote:
>
> > So if we are going to rename these things wholesale (which is fine
> > with me), I'd prefer it if we can drop the 'map' entirely.
> >
> > EFI memory table
> > EFI memory table entry
> > EFI memory table descriptor
>
> So if you don't actively hate the idea (which you don't seem to :) I can
> cook up a couple of patches and send the series out as an RFC - I think
> such reorganization is much better seen through the lens of an end
> result.
>
> I can then iterate/rebase the reorganization with low overhead, without
> committing them permanently yet.
>
> > For things described by the EFI memory table, we need to be more
> > specific anyway, since it typically describes everything, so
> >
> > EFI boot services area
> > EFI runtime services area
> > EFI reserved area
> > EFI conventional memory area
> > EFI MMIO area
> >
> > etc etc
>
> Yeah - that's the analogue of e820_entry->type, right?
>

Close enough, I guess.

> > Since we're being pedantic, it also makes sense to decide now whether
> > 'area' refers to all [discontiguous] regions or just one of them. I'd
> > say the former, and use 'region' for the latter, i.e., an area may be
> > made up of several regions, but only one exists of each type.
>
> Makes sense.
>
> Just to make clear I got it right: AFAICS we have 'struct
> efi_boot_memmap' that describes all of these areas, and boot_memmap->map
> is a linear array of efi_memory_desc_t entries, where each entry
> describes a contiguous range of memory of a given type, correct?
>

Yes.

> On that data structure level the notion of an 'area' doesn't really come
> up, which would be all the (discontiguous) entries of a given type. What
> we have is the 'map', the table, and the entries of the table.
>
> ( Regarding being pedantic: reading efi_insert_memmap() hit my
>   pain+confusion threshold, and I think it's better to follow a
>   hysteresis curve: i.e. instead of minimally moving it just below the
>   pain threshold we might as well bite the bullet and be really pedantic
>   and systematic about all of this in a single quick pass. That batches
>   it all up and is also generally a lot more fun and productive to
>   execute, as the end result will be a visible improvement (hopefully).
>   We did that with the FPU code and the e820 code recently and it was
>   rather successful. )
>

Yes, that works for me.

> Also, would it make sense to actually turn the table into a real C array,
> while taking the "->desc_size != sizeof(efi_memory_desc_t)" future
> expansion compatibility constraint into account:
>
>  - A single, quick bootstrap step allocates a proper array with known
>    entry sizes.
>

As in, taking the memory map provided by the firmware and rewriting it
so that any members we don't know about get dropped?

I'm not opposed to that per se, but on ARM/arm64, we never actually
modify the map at all like x86 does. Under kexec, the next kernel
actually gets the memory map straight from the firmware (although this
is orthogonal: if we rewrite it, the kexec kernel would simply do the
same )

>  - mem_table->entries[i] would thus actully exist on the C level, with a
>    mem_table->nr_entries limit - very similar to the e820 table code.
>

Could we just macroify this instead? I.e, use the () 'operator' to
hide the multiplication by ->desc_size to do the indexing.

>  - This would simplify the code at the expense of a single bootstrap
>    function that does the allocation and copying from EFI-format table to
>    Linux-format table.
>
>  - By all means efi_early_memdesc_ptr() and for_each_efi_memory_desc*()
>    are wrappers around a linear array with entry sizes unknown at build
>    time - that complication would largely go away.
>

Ok. I'm not convinced but I'll go along for now.

> Does the EFI runtime have any notion of the OS *modifying* the memory map
> and then the runtime relying on those modifications, or is it in practice
> treated as a read-only table?
>

No. The EFI memory map is provided by the firmware to the OS, but the
buffer it is returned in is owned by the caller, and only reflects the
memory map at the time of the call (although it cannot change anymore
after ExitBootServices())

> If we can make it a real array on the C level I was also hoping to sneak
> in the concept of an 'entry' somewhere, so that readers of the simple
> e820 table/entry logic would feel at home in the EFI code as well. Could
> we perhaps name 'descriptor' as 'entry', instead of 'region'?
>

Sure

> I.e. on the data structure level we'd have a clear hierarchy of
> table/entries/entry:
>
>    efi_mem_table:
>         ->entries[i]
>         ->nr_entries
>
> where 'entry->type' distinguishes the different areas.
>
> It's all pretty straightforward and would allow some unification of
> utility functions with the e820 code perhaps.
>
> In fact if we do that then we could probably rename:
>
>         struct e820_table  =>  struct x86_mem_table
>         struct e820_entry  =>  struct x86_mem_entry
>
> ... and extend x86_mem_entry with an 'attr' field?
>
> That way all the information stored in the EFI entry could be stored in
> an x86_mem_entry - and the EFI code (and the e820 code) could use this
> more generic table/entry data structure directly and deal with the raw
> underlying data types as little as possible.
>
> The e820 code already has a decoupled 'struct boot_e820_entry' data type
> for the boot protocol ABI, so I think 'e820_entry' could be extended and
> made generic - at least in principle.
>

I'm not fully up to speed with the history here. I assume EFI entries
are converted into E820 entries because that is what legacy BIOS
provides? So that means that this 'attr' field would have to be
populated on the non-EFI boot path as well. That is a bit out of my
area, so I will have to rely on others to assess that part.

> There's a number of high level advantages from doing that:
>
>  - Right now we put the EFI memory table entries into an e820 table and
>    then feed that to the memblock allocator. This is a mild conceptual
>    layering violation: EFI per definition has nothing to do with E820
>    BIOS calls. The e820_table/e820_entry is the de facto 'generic' entry
>    already, and the naming should now reflect that I believe.
>

Indeed

>  - There's 23 uses of for_each_efi_memory_desc*(), so the simplification
>    would be measurable even in low level EFI code.
>
>  - I'd also use the opportunity and collect these various pieces of
>    memory map functionality and decouple e820, EFI and generic memory map
>    handling methods a bit more - and maybe start sharing/unifying some of
>    it, such as user-defined memory maps which I believe are now
>    duplicated to a certain extent.
>
> Is this is a rough direction you'd agree to, or is there anything I'm
> missing here?
>

Apart from the point I made above, that ARM/arm64 don't do any
mangling of the memory map at all, this all looks reasonable to me.

One thing to keep in mind is that ia64 uses this code as well, and I
have no way to test any of that. Beyond that, I'd like to prevent x86
specifics to seep into the generic EFI code, but I don't think
aligning the two concepts should necessarily result in that. As you
say, looking at some draft patches should clarify this a lot.

Thanks,
Ard.