On Mon, 4 Jan 2016 21:17:31 +0100 Laszlo Ersek <lersek@xxxxxxxxxx> wrote: > Michael CC'd me on the grandparent of the email below. I'll try to add > my thoughts in a single go, with regard to OVMF. > > On 12/30/15 20:52, Michael S. Tsirkin wrote: > > On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote: > >> On Mon, 28 Dec 2015 14:50:15 +0200 > >> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > >> > >>> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote: > >>>> > >>>> Hi Michael, Paolo, > >>>> > >>>> Now it is the time to return to the challenge that how to reserve guest > >>>> physical region internally used by ACPI. > >>>> > >>>> Igor suggested that: > >>>> | An alternative place to allocate reserve from could be high memory. > >>>> | For pc we have "reserved-memory-end" which currently makes sure > >>>> | that hotpluggable memory range isn't used by firmware > >>>> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html) > > OVMF has no support for the "reserved-memory-end" fw_cfg file. The > reason is that nobody wrote that patch, nor asked for the patch to be > written. (Not implying that just requesting the patch would be > sufficient for the patch to be written.) > > >>> I don't want to tie things to reserved-memory-end because this > >>> does not scale: next time we need to reserve memory, > >>> we'll need to find yet another way to figure out what is where. > >> Could you elaborate a bit more on a problem you're seeing? > >> > >> To me it looks like it scales rather well. > >> For example lets imagine that we adding a device > >> that has some on device memory that should be mapped into GPA > >> code to do so would look like: > >> > >> pc_machine_device_plug_cb(dev) > >> { > >> ... > >> if (dev == OUR_NEW_DEVICE_TYPE) { > >> memory_region_add_subregion(as, current_reserved_end, &dev->mr); > >> set_new_reserved_end(current_reserved_end + memory_region_size(&dev->mr)); > >> } > >> } > >> > >> we can practically add any number of new devices that way. > > > > Yes but we'll have to build a host side allocator for these, and that's > > nasty. We'll also have to maintain these addresses indefinitely (at > > least per machine version) as they are guest visible. > > Not only that, there's no way for guest to know if we move things > > around, so basically we'll never be able to change addresses. > > > > > >> > >>> I would like ./hw/acpi/bios-linker-loader.c interface to be extended to > >>> support 64 bit RAM instead > > This looks quite doable in OVMF, as long as the blob to allocate from > high memory contains *zero* ACPI tables. > > ( > Namely, each ACPI table is installed from the containing fw_cfg blob > with EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), and the latter has its > own allocation policy for the *copies* of ACPI tables it installs. > > This allocation policy is left unspecified in the section of the UEFI > spec that governs EFI_ACPI_TABLE_PROTOCOL. > > The current policy in edk2 (= the reference implementation) seems to be > "allocate from under 4GB". It is currently being changed to "try to > allocate from under 4GB, and if that fails, retry from high memory". (It > is motivated by Aarch64 machines that may have no DRAM at all under 4GB.) > ) > > >>> (and maybe a way to allocate and > >>> zero-initialize buffer without loading it through fwcfg), > > Sounds reasonable. > > >>> this way bios > >>> does the allocation, and addresses can be patched into acpi. > >> and then guest side needs to parse/execute some AML that would > >> initialize QEMU side so it would know where to write data. > > > > Well not really - we can put it in a data table, by itself > > so it's easy to find. > > Do you mean acpi_tb_find_table(), acpi_get_table_by_index() / > acpi_get_table_with_size()? > > > > > AML is only needed if access from ACPI is desired. > > > > > >> bios-linker-loader is a great interface for initializing some > >> guest owned data and linking it together but I think it adds > >> unnecessary complexity and is misused if it's used to handle > >> device owned data/on device memory in this and VMGID cases. > > > > I want a generic interface for guest to enumerate these things. linker > > seems quite reasonable but if you see a reason why it won't do, or want > > to propose a better interface, fine. > > * The guest could do the following: > - while processing the ALLOCATE commands, it would make a note where in > GPA space each fw_cfg blob gets allocated > - at the end the guest would prepare a temporary array with a predefined > record format, that associates each fw_cfg blob's name with the concrete > allocation address > - it would create an FWCfgDmaAccess stucture pointing at this array, > with a new "control" bit set (or something similar) > - the guest could write the address of the FWCfgDmaAccess struct to the > appropriate register, as always. > > * Another idea would be a GET_ALLOCATION_ADDRESS linker/loader command, > specifying: > - the fw_cfg blob's name, for which to retrieve the guest-allocated > address (this command could only follow the matching ALLOCATE > command, never precede it) > - a flag whether the address should be written to IO or MMIO space > (would be likely IO on x86, MMIO on ARM) > - a unique uint64_t key (could be the 16-bit fw_cfg selector value that > identifies the blob, actually!) > - a uint64_t (IO or MMIO) address to write the unique key and then the > allocation address to. > > Either way, QEMU could learn about all the relevant guest-side > allocation addresses in a low number of traps. In addition, AML code > wouldn't have to reflect any allocation addresses to QEMU, ever. That would be nice trick. I see 2 issues here: 1. ACPI tables blob is build atomically when one guest tries to read it from fw_cfg so patched addresses have to be communicated to QEMU before that. 2. Mo important I think that we are miss-using linker-loader interface here, trying to from allocate buffer in guest RAM an so consuming it while all we need a window into device memory mapped somewhere outside of RAM occupied address space. > > > > > PCI would do, too - though windows guys had concerns about > > returning PCI BARs from ACPI. > > > > > >> There was RFC on list to make BIOS boot from NVDIMM already > >> doing some ACPI table lookup/parsing. Now if they were forced > >> to also parse and execute AML to initialize QEMU with guest > >> allocated address that would complicate them quite a bit. > > > > If they just need to find a table by name, it won't be > > too bad, would it? > > > >> While with NVDIMM control memory region mapped directly by QEMU, > >> respective patches don't need in any way to initialize QEMU, > >> all they would need just read necessary data from control region. > >> > >> Also using bios-linker-loader takes away some usable RAM > >> from guest and in the end that doesn't scale, > >> the more devices I add the less usable RAM is left for guest OS > >> while all the device needs is a piece of GPA address space > >> that would belong to it. > > > > I don't get this comment. I don't think it's MMIO that is wanted. > > If it's backed by qemu virtual memory then it's RAM. > > > >>> > >>> See patch at the bottom that might be handy. > > I've given up on Microsoft implementing DataTableRegion. (It's sad, really.) > > From last year I have a WIP version of "docs/vmgenid.txt" that is based > on Michael's build_append_named_dword() function. If > GET_ALLOCATION_ADDRESS above looks good, then I could simplify the ACPI > stuff in that text file (and hopefully post it soon after for comments?) > > >>> > >>>> he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1: > >>>> | when writing ASL one shall make sure that only XP supported > >>>> | features are in global scope, which is evaluated when tables > >>>> | are loaded and features of rev2 and higher are inside methods. > >>>> | That way XP doesn't crash as far as it doesn't evaluate unsupported > >>>> | opcode and one can guard those opcodes checking _REV object if neccesary. > >>>> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html) > >>> > >>> Yes, this technique works. > > Agreed. > > >>> > >>> An alternative is to add an XSDT, XP ignores that. > >>> XSDT at the moment breaks OVMF (because it loads both > >>> the RSDT and the XSDT, which is wrong), but I think > >>> Laszlo was working on a fix for that. > > We have to distinguish two use cases here. > > * The first is the case when QEMU prepares both an XSDT and an RSDT, and > links at least one common ACPI table from both. This would cause OVMF to > pass the same source (= to-be-copied) table to > EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable() twice, with one of the > following outcomes: > > - there would be two instances of the same table (think e.g. SSDT) > - the second attempt would be rejected (e.g. FADT) and that error would > terminate the linker-loader procedure. > > This issue would not be too hard to overcome, with a simple "memoization > technique". After the initial loading & linking of the tables, OVMF > could remember the addresses of the "source" ACPI tables, and could > avoid passing already installed source tables to > EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable() for a second time. > > * The second use case is when an ACPI table is linked *only* from QEMU's > XSDT. This is much harder to fix, because > EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable() in edk2 links the copy of the > passed-in table into *both* RSDT and XSDT, automatically. And, again, > the UEFI spec doesn't provide a way to control this from the caller > (i.e. from within OVMF). > > I have tried earlier to effect a change in the specification of > EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), on the ASWG and USWG mailing > lists. (At that time I was trying to expose UEFI memory *type* to the > caller, from which the copy of the ACPI table being installed should be > allocated from.) Alas, I received no answers at all. > > All in all I strongly recommend the "place rev2+ objects in method > scope" trick, over the "link it from the XSDT only" trick. > > >> Using XSDT would increase ACPI tables occupied RAM > >> as it would duplicate DSDT + non XP supported AML > >> at global namespace. > > > > Not at all - I posted patches linking to same > > tables from both RSDT and XSDT at some point. > > Yes, at <http://thread.gmane.org/gmane.comp.emulators.qemu/342559>. This > could be made work in OVMF with the above mentioned memoization stuff. > > > Only the list of pointers would be different. > > I don't recommend that, see the second case above. > > Thanks > Laszlo > > >> So far we've managed keep DSDT compatible with XP while > >> introducing features from v2 and higher ACPI revisions as > >> AML that is only evaluated on demand. > >> We can continue doing so unless we have to unconditionally > >> add incompatible AML at global scope. > >> > > > > Yes. > > > >>> > >>>> Michael, Paolo, what do you think about these ideas? > >>>> > >>>> Thanks! > >>> > >>> > >>> > >>> So using a patch below, we can add Name(PQRS, 0x0) at the top of the > >>> SSDT (or bottom, or add a separate SSDT just for that). It returns the > >>> current offset so we can add that to the linker. > >>> > >>> Won't work if you append the Name to the Aml structure (these can be > >>> nested to arbitrary depth using aml_append), so using plain GArray for > >>> this API makes sense to me. > >>> > >>> ---> > >>> > >>> acpi: add build_append_named_dword, returning an offset in buffer > >>> > >>> This is a very limited form of support for runtime patching - > >>> similar in functionality to what we can do with ACPI_EXTRACT > >>> macros in python, but implemented in C. > >>> > >>> This is to allow ACPI code direct access to data tables - > >>> which is exactly what DataTableRegion is there for, except > >>> no known windows release so far implements DataTableRegion. > >> unsupported means Windows will BSOD, so it's practically > >> unusable unless MS will patch currently existing Windows > >> versions. > > > > Yes. That's why my patch allows patching SSDT without using > > DataTableRegion. > > > >> Another thing about DataTableRegion is that ACPI tables are > >> supposed to have static content which matches checksum in > >> table the header while you are trying to use it for dynamic > >> data. It would be cleaner/more compatible to teach > >> bios-linker-loader to just allocate memory and patch AML > >> with the allocated address. > > > > Yes - if address is static, you need to put it outside > > the table. Can come right before or right after this. > > > >> Also if OperationRegion() is used, then one has to patch > >> DefOpRegion directly as RegionOffset must be Integer, > >> using variable names is not permitted there. > > > > I am not sure the comment was understood correctly. > > The comment says really "we can't use DataTableRegion > > so here is an alternative". > > > >> > >>> > >>> Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx> > >>> > >>> --- > >>> > >>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h > >>> index 1b632dc..f8998ea 100644 > >>> --- a/include/hw/acpi/aml-build.h > >>> +++ b/include/hw/acpi/aml-build.h > >>> @@ -286,4 +286,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre); > >>> void > >>> build_rsdt(GArray *table_data, GArray *linker, GArray *table_offsets); > >>> > >>> +int > >>> +build_append_named_dword(GArray *array, const char *name_format, ...); > >>> + > >>> #endif > >>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c > >>> index 0d4b324..7f9fa65 100644 > >>> --- a/hw/acpi/aml-build.c > >>> +++ b/hw/acpi/aml-build.c > >>> @@ -262,6 +262,32 @@ static void build_append_int(GArray *table, uint64_t value) > >>> } > >>> } > >>> > >>> +/* Build NAME(XXXX, 0x0) where 0x0 is encoded as a qword, > >>> + * and return the offset to 0x0 for runtime patching. > >>> + * > >>> + * Warning: runtime patching is best avoided. Only use this as > >>> + * a replacement for DataTableRegion (for guests that don't > >>> + * support it). > >>> + */ > >>> +int > >>> +build_append_named_qword(GArray *array, const char *name_format, ...) > >>> +{ > >>> + int offset; > >>> + va_list ap; > >>> + > >>> + va_start(ap, name_format); > >>> + build_append_namestringv(array, name_format, ap); > >>> + va_end(ap); > >>> + > >>> + build_append_byte(array, 0x0E); /* QWordPrefix */ > >>> + > >>> + offset = array->len; > >>> + build_append_int_noprefix(array, 0x0, 8); > >>> + assert(array->len == offset + 8); > >>> + > >>> + return offset; > >>> +} > >>> + > >>> static GPtrArray *alloc_list; > >>> > >>> static Aml *aml_alloc(void) > >>> > >>> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html