Re: How to reserve guest physical region for ACPI

Igor Mammedov <imammedo@xxxxxxxxxx> · Tue, 5 Jan 2016 18:08:51 +0100



On Mon, 4 Jan 2016 21:17:31 +0100
Laszlo Ersek <lersek@xxxxxxxxxx> wrote:

> Michael CC'd me on the grandparent of the email below. I'll try to add
> my thoughts in a single go, with regard to OVMF.
> 
> On 12/30/15 20:52, Michael S. Tsirkin wrote:
> > On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:  
> >> On Mon, 28 Dec 2015 14:50:15 +0200
> >> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> >>  
> >>> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:  
> >>>>
> >>>> Hi Michael, Paolo,
> >>>>
> >>>> Now it is the time to return to the challenge that how to reserve guest
> >>>> physical region internally used by ACPI.
> >>>>
> >>>> Igor suggested that:
> >>>> | An alternative place to allocate reserve from could be high memory.
> >>>> | For pc we have "reserved-memory-end" which currently makes sure
> >>>> | that hotpluggable memory range isn't used by firmware
> >>>> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)  
> 
> OVMF has no support for the "reserved-memory-end" fw_cfg file. The
> reason is that nobody wrote that patch, nor asked for the patch to be
> written. (Not implying that just requesting the patch would be
> sufficient for the patch to be written.)
> 
> >>> I don't want to tie things to reserved-memory-end because this
> >>> does not scale: next time we need to reserve memory,
> >>> we'll need to find yet another way to figure out what is where.  
> >> Could you elaborate a bit more on a problem you're seeing?
> >>
> >> To me it looks like it scales rather well.
> >> For example lets imagine that we adding a device
> >> that has some on device memory that should be mapped into GPA
> >> code to do so would look like:
> >>
> >>   pc_machine_device_plug_cb(dev)
> >>   {
> >>    ...
> >>    if (dev == OUR_NEW_DEVICE_TYPE) {
> >>        memory_region_add_subregion(as, current_reserved_end, &dev->mr);
> >>        set_new_reserved_end(current_reserved_end + memory_region_size(&dev->mr));
> >>    }
> >>   }
> >>
> >> we can practically add any number of new devices that way.  
> > 
> > Yes but we'll have to build a host side allocator for these, and that's
> > nasty. We'll also have to maintain these addresses indefinitely (at
> > least per machine version) as they are guest visible.
> > Not only that, there's no way for guest to know if we move things
> > around, so basically we'll never be able to change addresses.
> > 
> >   
> >>    
> >>> I would like ./hw/acpi/bios-linker-loader.c interface to be extended to
> >>> support 64 bit RAM instead  
> 
> This looks quite doable in OVMF, as long as the blob to allocate from
> high memory contains *zero* ACPI tables.
> 
> (
> Namely, each ACPI table is installed from the containing fw_cfg blob
> with EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), and the latter has its
> own allocation policy for the *copies* of ACPI tables it installs.
> 
> This allocation policy is left unspecified in the section of the UEFI
> spec that governs EFI_ACPI_TABLE_PROTOCOL.
> 
> The current policy in edk2 (= the reference implementation) seems to be
> "allocate from under 4GB". It is currently being changed to "try to
> allocate from under 4GB, and if that fails, retry from high memory". (It
> is motivated by Aarch64 machines that may have no DRAM at all under 4GB.)
> )
> 
> >>> (and maybe a way to allocate and
> >>> zero-initialize buffer without loading it through fwcfg),  
> 
> Sounds reasonable.
> 
> >>> this way bios
> >>> does the allocation, and addresses can be patched into acpi.  
> >> and then guest side needs to parse/execute some AML that would
> >> initialize QEMU side so it would know where to write data.  
> > 
> > Well not really - we can put it in a data table, by itself
> > so it's easy to find.  
> 
> Do you mean acpi_tb_find_table(), acpi_get_table_by_index() /
> acpi_get_table_with_size()?
> 
> > 
> > AML is only needed if access from ACPI is desired.
> > 
> >   
> >> bios-linker-loader is a great interface for initializing some
> >> guest owned data and linking it together but I think it adds
> >> unnecessary complexity and is misused if it's used to handle
> >> device owned data/on device memory in this and VMGID cases.  
> > 
> > I want a generic interface for guest to enumerate these things.  linker
> > seems quite reasonable but if you see a reason why it won't do, or want
> > to propose a better interface, fine.  
> 
> * The guest could do the following:
> - while processing the ALLOCATE commands, it would make a note where in
> GPA space each fw_cfg blob gets allocated
> - at the end the guest would prepare a temporary array with a predefined
> record format, that associates each fw_cfg blob's name with the concrete
> allocation address
> - it would create an FWCfgDmaAccess stucture pointing at this array,
> with a new "control" bit set (or something similar)
> - the guest could write the address of the FWCfgDmaAccess struct to the
> appropriate register, as always.
> 
> * Another idea would be a GET_ALLOCATION_ADDRESS linker/loader command,
> specifying:
> - the fw_cfg blob's name, for which to retrieve the guest-allocated
>   address (this command could only follow the matching ALLOCATE
>   command, never precede it)
> - a flag whether the address should be written to IO or MMIO space
>   (would be likely IO on x86, MMIO on ARM)
> - a unique uint64_t key (could be the 16-bit fw_cfg selector value that
>   identifies the blob, actually!)
> - a uint64_t (IO or MMIO) address to write the unique key and then the
>   allocation address to.
> 
> Either way, QEMU could learn about all the relevant guest-side
> allocation addresses in a low number of traps. In addition, AML code
> wouldn't have to reflect any allocation addresses to QEMU, ever.
That would be nice trick. I see 2 issues here:
 1. ACPI tables blob is build atomically when one guest tries to read it
    from fw_cfg so patched addresses have to be communicated
    to QEMU before that.
 2. Mo important I think that we are miss-using linker-loader
    interface here, trying to from allocate buffer in guest RAM
    an so consuming it while all we need a window into device
    memory mapped somewhere outside of RAM occupied  address space.

> 
> > 
> > PCI would do, too - though windows guys had concerns about
> > returning PCI BARs from ACPI.
> > 
> >   
> >> There was RFC on list to make BIOS boot from NVDIMM already
> >> doing some ACPI table lookup/parsing. Now if they were forced
> >> to also parse and execute AML to initialize QEMU with guest
> >> allocated address that would complicate them quite a bit.  
> > 
> > If they just need to find a table by name, it won't be
> > too bad, would it?
> >   
> >> While with NVDIMM control memory region mapped directly by QEMU,
> >> respective patches don't need in any way to initialize QEMU,
> >> all they would need just read necessary data from control region.
> >>
> >> Also using bios-linker-loader takes away some usable RAM
> >> from guest and in the end that doesn't scale,
> >> the more devices I add the less usable RAM is left for guest OS
> >> while all the device needs is a piece of GPA address space
> >> that would belong to it.  
> > 
> > I don't get this comment. I don't think it's MMIO that is wanted.
> > If it's backed by qemu virtual memory then it's RAM.
> >   
> >>>
> >>> See patch at the bottom that might be handy.  
> 
> I've given up on Microsoft implementing DataTableRegion. (It's sad, really.)
> 
> From last year I have a WIP version of "docs/vmgenid.txt" that is based
> on Michael's build_append_named_dword() function. If
> GET_ALLOCATION_ADDRESS above looks good, then I could simplify the ACPI
> stuff in that text file (and hopefully post it soon after for comments?)
> 
> >>>  
> >>>> he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1:
> >>>> | when writing ASL one shall make sure that only XP supported
> >>>> | features are in global scope, which is evaluated when tables
> >>>> | are loaded and features of rev2 and higher are inside methods.
> >>>> | That way XP doesn't crash as far as it doesn't evaluate unsupported
> >>>> | opcode and one can guard those opcodes checking _REV object if neccesary.
> >>>> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html)  
> >>>
> >>> Yes, this technique works.  
> 
> Agreed.
> 
> >>>
> >>> An alternative is to add an XSDT, XP ignores that.
> >>> XSDT at the moment breaks OVMF (because it loads both
> >>> the RSDT and the XSDT, which is wrong), but I think
> >>> Laszlo was working on a fix for that.  
> 
> We have to distinguish two use cases here.
> 
> * The first is the case when QEMU prepares both an XSDT and an RSDT, and
> links at least one common ACPI table from both. This would cause OVMF to
> pass the same source (= to-be-copied) table to
> EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable() twice, with one of the
> following outcomes:
> 
> - there would be two instances of the same table (think e.g. SSDT)
> - the second attempt would be rejected (e.g. FADT) and that error would
>   terminate the linker-loader procedure.
> 
> This issue would not be too hard to overcome, with a simple "memoization
> technique". After the initial loading & linking of the tables, OVMF
> could remember the addresses of the "source" ACPI tables, and could
> avoid passing already installed source tables to
> EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable() for a second time.
> 
> * The second use case is when an ACPI table is linked *only* from QEMU's
> XSDT. This is much harder to fix, because
> EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable() in edk2 links the copy of the
> passed-in table into *both* RSDT and XSDT, automatically. And, again,
> the UEFI spec doesn't provide a way to control this from the caller
> (i.e. from within OVMF).
> 
> I have tried earlier to effect a change in the specification of
> EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), on the ASWG and USWG mailing
> lists. (At that time I was trying to expose UEFI memory *type* to the
> caller, from which the copy of the ACPI table being installed should be
> allocated from.) Alas, I received no answers at all.
> 
> All in all I strongly recommend the "place rev2+ objects in method
> scope" trick, over the "link it from the XSDT only" trick.
> 
> >> Using XSDT would increase ACPI tables occupied RAM
> >> as it would duplicate DSDT + non XP supported AML
> >> at global namespace.  
> > 
> > Not at all - I posted patches linking to same
> > tables from both RSDT and XSDT at some point.  
> 
> Yes, at <http://thread.gmane.org/gmane.comp.emulators.qemu/342559>. This
> could be made work in OVMF with the above mentioned memoization stuff.
> 
> > Only the list of pointers would be different.  
> 
> I don't recommend that, see the second case above.
> 
> Thanks
> Laszlo
> 
> >> So far we've managed keep DSDT compatible with XP while
> >> introducing features from v2 and higher ACPI revisions as
> >> AML that is only evaluated on demand.
> >> We can continue doing so unless we have to unconditionally
> >> add incompatible AML at global scope.
> >>  
> > 
> > Yes.
> >   
> >>>  
> >>>> Michael, Paolo, what do you think about these ideas?
> >>>>
> >>>> Thanks!  
> >>>
> >>>
> >>>
> >>> So using a patch below, we can add Name(PQRS, 0x0) at the top of the
> >>> SSDT (or bottom, or add a separate SSDT just for that).  It returns the
> >>> current offset so we can add that to the linker.
> >>>
> >>> Won't work if you append the Name to the Aml structure (these can be
> >>> nested to arbitrary depth using aml_append), so using plain GArray for
> >>> this API makes sense to me.
> >>>  
> >>> --->  
> >>>
> >>> acpi: add build_append_named_dword, returning an offset in buffer
> >>>
> >>> This is a very limited form of support for runtime patching -
> >>> similar in functionality to what we can do with ACPI_EXTRACT
> >>> macros in python, but implemented in C.
> >>>
> >>> This is to allow ACPI code direct access to data tables -
> >>> which is exactly what DataTableRegion is there for, except
> >>> no known windows release so far implements DataTableRegion.  
> >> unsupported means Windows will BSOD, so it's practically
> >> unusable unless MS will patch currently existing Windows
> >> versions.  
> > 
> > Yes. That's why my patch allows patching SSDT without using
> > DataTableRegion.
> >   
> >> Another thing about DataTableRegion is that ACPI tables are
> >> supposed to have static content which matches checksum in
> >> table the header while you are trying to use it for dynamic
> >> data. It would be cleaner/more compatible to teach
> >> bios-linker-loader to just allocate memory and patch AML
> >> with the allocated address.  
> > 
> > Yes - if address is static, you need to put it outside
> > the table. Can come right before or right after this.
> >   
> >> Also if OperationRegion() is used, then one has to patch
> >> DefOpRegion directly as RegionOffset must be Integer,
> >> using variable names is not permitted there.  
> > 
> > I am not sure the comment was understood correctly.
> > The comment says really "we can't use DataTableRegion
> > so here is an alternative".
> >   
> >>  
> >>>
> >>> Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
> >>>
> >>> ---
> >>>
> >>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> >>> index 1b632dc..f8998ea 100644
> >>> --- a/include/hw/acpi/aml-build.h
> >>> +++ b/include/hw/acpi/aml-build.h
> >>> @@ -286,4 +286,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre);
> >>>  void
> >>>  build_rsdt(GArray *table_data, GArray *linker, GArray *table_offsets);
> >>>  
> >>> +int
> >>> +build_append_named_dword(GArray *array, const char *name_format, ...);
> >>> +
> >>>  #endif
> >>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> >>> index 0d4b324..7f9fa65 100644
> >>> --- a/hw/acpi/aml-build.c
> >>> +++ b/hw/acpi/aml-build.c
> >>> @@ -262,6 +262,32 @@ static void build_append_int(GArray *table, uint64_t value)
> >>>      }
> >>>  }
> >>>  
> >>> +/* Build NAME(XXXX, 0x0) where 0x0 is encoded as a qword,
> >>> + * and return the offset to 0x0 for runtime patching.
> >>> + *
> >>> + * Warning: runtime patching is best avoided. Only use this as
> >>> + * a replacement for DataTableRegion (for guests that don't
> >>> + * support it).
> >>> + */
> >>> +int
> >>> +build_append_named_qword(GArray *array, const char *name_format, ...)
> >>> +{
> >>> +    int offset;
> >>> +    va_list ap;
> >>> +
> >>> +    va_start(ap, name_format);
> >>> +    build_append_namestringv(array, name_format, ap);
> >>> +    va_end(ap);
> >>> +
> >>> +    build_append_byte(array, 0x0E); /* QWordPrefix */
> >>> +
> >>> +    offset = array->len;
> >>> +    build_append_int_noprefix(array, 0x0, 8);
> >>> +    assert(array->len == offset + 8);
> >>> +
> >>> +    return offset;
> >>> +}
> >>> +
> >>>  static GPtrArray *alloc_list;
> >>>  
> >>>  static Aml *aml_alloc(void)
> >>>
> >>>  
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html