On 01/05/16 18:08, Igor Mammedov wrote: > On Mon, 4 Jan 2016 21:17:31 +0100 > Laszlo Ersek <lersek@xxxxxxxxxx> wrote: > >> Michael CC'd me on the grandparent of the email below. I'll try to add >> my thoughts in a single go, with regard to OVMF. >> >> On 12/30/15 20:52, Michael S. Tsirkin wrote: >>> On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote: >>>> On Mon, 28 Dec 2015 14:50:15 +0200 >>>> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: >>>> >>>>> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote: >>>>>> >>>>>> Hi Michael, Paolo, >>>>>> >>>>>> Now it is the time to return to the challenge that how to reserve guest >>>>>> physical region internally used by ACPI. >>>>>> >>>>>> Igor suggested that: >>>>>> | An alternative place to allocate reserve from could be high memory. >>>>>> | For pc we have "reserved-memory-end" which currently makes sure >>>>>> | that hotpluggable memory range isn't used by firmware >>>>>> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html) >> >> OVMF has no support for the "reserved-memory-end" fw_cfg file. The >> reason is that nobody wrote that patch, nor asked for the patch to be >> written. (Not implying that just requesting the patch would be >> sufficient for the patch to be written.) >> >>>>> I don't want to tie things to reserved-memory-end because this >>>>> does not scale: next time we need to reserve memory, >>>>> we'll need to find yet another way to figure out what is where. >>>> Could you elaborate a bit more on a problem you're seeing? >>>> >>>> To me it looks like it scales rather well. >>>> For example lets imagine that we adding a device >>>> that has some on device memory that should be mapped into GPA >>>> code to do so would look like: >>>> >>>> pc_machine_device_plug_cb(dev) >>>> { >>>> ... >>>> if (dev == OUR_NEW_DEVICE_TYPE) { >>>> memory_region_add_subregion(as, current_reserved_end, &dev->mr); >>>> set_new_reserved_end(current_reserved_end + memory_region_size(&dev->mr)); >>>> } >>>> } >>>> >>>> we can practically add any number of new devices that way. >>> >>> Yes but we'll have to build a host side allocator for these, and that's >>> nasty. We'll also have to maintain these addresses indefinitely (at >>> least per machine version) as they are guest visible. >>> Not only that, there's no way for guest to know if we move things >>> around, so basically we'll never be able to change addresses. >>> >>> >>>> >>>>> I would like ./hw/acpi/bios-linker-loader.c interface to be extended to >>>>> support 64 bit RAM instead >> >> This looks quite doable in OVMF, as long as the blob to allocate from >> high memory contains *zero* ACPI tables. >> >> ( >> Namely, each ACPI table is installed from the containing fw_cfg blob >> with EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), and the latter has its >> own allocation policy for the *copies* of ACPI tables it installs. >> >> This allocation policy is left unspecified in the section of the UEFI >> spec that governs EFI_ACPI_TABLE_PROTOCOL. >> >> The current policy in edk2 (= the reference implementation) seems to be >> "allocate from under 4GB". It is currently being changed to "try to >> allocate from under 4GB, and if that fails, retry from high memory". (It >> is motivated by Aarch64 machines that may have no DRAM at all under 4GB.) >> ) >> >>>>> (and maybe a way to allocate and >>>>> zero-initialize buffer without loading it through fwcfg), >> >> Sounds reasonable. >> >>>>> this way bios >>>>> does the allocation, and addresses can be patched into acpi. >>>> and then guest side needs to parse/execute some AML that would >>>> initialize QEMU side so it would know where to write data. >>> >>> Well not really - we can put it in a data table, by itself >>> so it's easy to find. >> >> Do you mean acpi_tb_find_table(), acpi_get_table_by_index() / >> acpi_get_table_with_size()? >> >>> >>> AML is only needed if access from ACPI is desired. >>> >>> >>>> bios-linker-loader is a great interface for initializing some >>>> guest owned data and linking it together but I think it adds >>>> unnecessary complexity and is misused if it's used to handle >>>> device owned data/on device memory in this and VMGID cases. >>> >>> I want a generic interface for guest to enumerate these things. linker >>> seems quite reasonable but if you see a reason why it won't do, or want >>> to propose a better interface, fine. >> >> * The guest could do the following: >> - while processing the ALLOCATE commands, it would make a note where in >> GPA space each fw_cfg blob gets allocated >> - at the end the guest would prepare a temporary array with a predefined >> record format, that associates each fw_cfg blob's name with the concrete >> allocation address >> - it would create an FWCfgDmaAccess stucture pointing at this array, >> with a new "control" bit set (or something similar) >> - the guest could write the address of the FWCfgDmaAccess struct to the >> appropriate register, as always. >> >> * Another idea would be a GET_ALLOCATION_ADDRESS linker/loader command, >> specifying: >> - the fw_cfg blob's name, for which to retrieve the guest-allocated >> address (this command could only follow the matching ALLOCATE >> command, never precede it) >> - a flag whether the address should be written to IO or MMIO space >> (would be likely IO on x86, MMIO on ARM) >> - a unique uint64_t key (could be the 16-bit fw_cfg selector value that >> identifies the blob, actually!) >> - a uint64_t (IO or MMIO) address to write the unique key and then the >> allocation address to. >> >> Either way, QEMU could learn about all the relevant guest-side >> allocation addresses in a low number of traps. In addition, AML code >> wouldn't have to reflect any allocation addresses to QEMU, ever. > That would be nice trick. I see 2 issues here: > 1. ACPI tables blob is build atomically when one guest tries to read it > from fw_cfg so patched addresses have to be communicated > to QEMU before that. I don't understand issue #1. I think it is okay if the allocation happens strictly after QEMU refreshes / regenerates the ACPI payload. Namely, the guest-allocated addresses have two uses: - references from within the ACPI payload - references from the QEMU side, for device operation. The first purpose is covered by the linker/loader itself (that is, GET_ALLOCATION_ADDRESS would be used *in addition* to ADD_POINTER). The second purpose would be covered by GET_ALLOCATION_ADDRESS. > 2. Mo important I think that we are miss-using linker-loader > interface here, trying to from allocate buffer in guest RAM > an so consuming it while all we need a window into device > memory mapped somewhere outside of RAM occupied address space. But, more importantly, I definitely see your point with issue #2. I'm neutral on the question whether this should be solved with the ACPI linker/loader or with something else. I'm perfectly fine with "something else", as long as it is generic enough. The above GET_ALLOCATION_ADDRESS idea is relevant *only if* the ACPI linker/loader is deemed the best solution here. (Heck, if the linker/loader avenue is rejected here, that's the least work for me! :)) Thanks Laszlo > >> >>> >>> PCI would do, too - though windows guys had concerns about >>> returning PCI BARs from ACPI. >>> >>> >>>> There was RFC on list to make BIOS boot from NVDIMM already >>>> doing some ACPI table lookup/parsing. Now if they were forced >>>> to also parse and execute AML to initialize QEMU with guest >>>> allocated address that would complicate them quite a bit. >>> >>> If they just need to find a table by name, it won't be >>> too bad, would it? >>> >>>> While with NVDIMM control memory region mapped directly by QEMU, >>>> respective patches don't need in any way to initialize QEMU, >>>> all they would need just read necessary data from control region. >>>> >>>> Also using bios-linker-loader takes away some usable RAM >>>> from guest and in the end that doesn't scale, >>>> the more devices I add the less usable RAM is left for guest OS >>>> while all the device needs is a piece of GPA address space >>>> that would belong to it. >>> >>> I don't get this comment. I don't think it's MMIO that is wanted. >>> If it's backed by qemu virtual memory then it's RAM. >>> >>>>> >>>>> See patch at the bottom that might be handy. >> >> I've given up on Microsoft implementing DataTableRegion. (It's sad, really.) >> >> From last year I have a WIP version of "docs/vmgenid.txt" that is based >> on Michael's build_append_named_dword() function. If >> GET_ALLOCATION_ADDRESS above looks good, then I could simplify the ACPI >> stuff in that text file (and hopefully post it soon after for comments?) >> >>>>> >>>>>> he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1: >>>>>> | when writing ASL one shall make sure that only XP supported >>>>>> | features are in global scope, which is evaluated when tables >>>>>> | are loaded and features of rev2 and higher are inside methods. >>>>>> | That way XP doesn't crash as far as it doesn't evaluate unsupported >>>>>> | opcode and one can guard those opcodes checking _REV object if neccesary. >>>>>> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html) >>>>> >>>>> Yes, this technique works. >> >> Agreed. >> >>>>> >>>>> An alternative is to add an XSDT, XP ignores that. >>>>> XSDT at the moment breaks OVMF (because it loads both >>>>> the RSDT and the XSDT, which is wrong), but I think >>>>> Laszlo was working on a fix for that. >> >> We have to distinguish two use cases here. >> >> * The first is the case when QEMU prepares both an XSDT and an RSDT, and >> links at least one common ACPI table from both. This would cause OVMF to >> pass the same source (= to-be-copied) table to >> EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable() twice, with one of the >> following outcomes: >> >> - there would be two instances of the same table (think e.g. SSDT) >> - the second attempt would be rejected (e.g. FADT) and that error would >> terminate the linker-loader procedure. >> >> This issue would not be too hard to overcome, with a simple "memoization >> technique". After the initial loading & linking of the tables, OVMF >> could remember the addresses of the "source" ACPI tables, and could >> avoid passing already installed source tables to >> EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable() for a second time. >> >> * The second use case is when an ACPI table is linked *only* from QEMU's >> XSDT. This is much harder to fix, because >> EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable() in edk2 links the copy of the >> passed-in table into *both* RSDT and XSDT, automatically. And, again, >> the UEFI spec doesn't provide a way to control this from the caller >> (i.e. from within OVMF). >> >> I have tried earlier to effect a change in the specification of >> EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), on the ASWG and USWG mailing >> lists. (At that time I was trying to expose UEFI memory *type* to the >> caller, from which the copy of the ACPI table being installed should be >> allocated from.) Alas, I received no answers at all. >> >> All in all I strongly recommend the "place rev2+ objects in method >> scope" trick, over the "link it from the XSDT only" trick. >> >>>> Using XSDT would increase ACPI tables occupied RAM >>>> as it would duplicate DSDT + non XP supported AML >>>> at global namespace. >>> >>> Not at all - I posted patches linking to same >>> tables from both RSDT and XSDT at some point. >> >> Yes, at <http://thread.gmane.org/gmane.comp.emulators.qemu/342559>. This >> could be made work in OVMF with the above mentioned memoization stuff. >> >>> Only the list of pointers would be different. >> >> I don't recommend that, see the second case above. >> >> Thanks >> Laszlo >> >>>> So far we've managed keep DSDT compatible with XP while >>>> introducing features from v2 and higher ACPI revisions as >>>> AML that is only evaluated on demand. >>>> We can continue doing so unless we have to unconditionally >>>> add incompatible AML at global scope. >>>> >>> >>> Yes. >>> >>>>> >>>>>> Michael, Paolo, what do you think about these ideas? >>>>>> >>>>>> Thanks! >>>>> >>>>> >>>>> >>>>> So using a patch below, we can add Name(PQRS, 0x0) at the top of the >>>>> SSDT (or bottom, or add a separate SSDT just for that). It returns the >>>>> current offset so we can add that to the linker. >>>>> >>>>> Won't work if you append the Name to the Aml structure (these can be >>>>> nested to arbitrary depth using aml_append), so using plain GArray for >>>>> this API makes sense to me. >>>>> >>>>> ---> >>>>> >>>>> acpi: add build_append_named_dword, returning an offset in buffer >>>>> >>>>> This is a very limited form of support for runtime patching - >>>>> similar in functionality to what we can do with ACPI_EXTRACT >>>>> macros in python, but implemented in C. >>>>> >>>>> This is to allow ACPI code direct access to data tables - >>>>> which is exactly what DataTableRegion is there for, except >>>>> no known windows release so far implements DataTableRegion. >>>> unsupported means Windows will BSOD, so it's practically >>>> unusable unless MS will patch currently existing Windows >>>> versions. >>> >>> Yes. That's why my patch allows patching SSDT without using >>> DataTableRegion. >>> >>>> Another thing about DataTableRegion is that ACPI tables are >>>> supposed to have static content which matches checksum in >>>> table the header while you are trying to use it for dynamic >>>> data. It would be cleaner/more compatible to teach >>>> bios-linker-loader to just allocate memory and patch AML >>>> with the allocated address. >>> >>> Yes - if address is static, you need to put it outside >>> the table. Can come right before or right after this. >>> >>>> Also if OperationRegion() is used, then one has to patch >>>> DefOpRegion directly as RegionOffset must be Integer, >>>> using variable names is not permitted there. >>> >>> I am not sure the comment was understood correctly. >>> The comment says really "we can't use DataTableRegion >>> so here is an alternative". >>> >>>> >>>>> >>>>> Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx> >>>>> >>>>> --- >>>>> >>>>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h >>>>> index 1b632dc..f8998ea 100644 >>>>> --- a/include/hw/acpi/aml-build.h >>>>> +++ b/include/hw/acpi/aml-build.h >>>>> @@ -286,4 +286,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre); >>>>> void >>>>> build_rsdt(GArray *table_data, GArray *linker, GArray *table_offsets); >>>>> >>>>> +int >>>>> +build_append_named_dword(GArray *array, const char *name_format, ...); >>>>> + >>>>> #endif >>>>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c >>>>> index 0d4b324..7f9fa65 100644 >>>>> --- a/hw/acpi/aml-build.c >>>>> +++ b/hw/acpi/aml-build.c >>>>> @@ -262,6 +262,32 @@ static void build_append_int(GArray *table, uint64_t value) >>>>> } >>>>> } >>>>> >>>>> +/* Build NAME(XXXX, 0x0) where 0x0 is encoded as a qword, >>>>> + * and return the offset to 0x0 for runtime patching. >>>>> + * >>>>> + * Warning: runtime patching is best avoided. Only use this as >>>>> + * a replacement for DataTableRegion (for guests that don't >>>>> + * support it). >>>>> + */ >>>>> +int >>>>> +build_append_named_qword(GArray *array, const char *name_format, ...) >>>>> +{ >>>>> + int offset; >>>>> + va_list ap; >>>>> + >>>>> + va_start(ap, name_format); >>>>> + build_append_namestringv(array, name_format, ap); >>>>> + va_end(ap); >>>>> + >>>>> + build_append_byte(array, 0x0E); /* QWordPrefix */ >>>>> + >>>>> + offset = array->len; >>>>> + build_append_int_noprefix(array, 0x0, 8); >>>>> + assert(array->len == offset + 8); >>>>> + >>>>> + return offset; >>>>> +} >>>>> + >>>>> static GPtrArray *alloc_list; >>>>> >>>>> static Aml *aml_alloc(void) >>>>> >>>>> >> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html