On 01/07/16 14:42, Igor Mammedov wrote: > On Thu, 7 Jan 2016 12:54:30 +0200 > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > >> On Thu, Jan 07, 2016 at 11:30:25AM +0100, Igor Mammedov wrote: >>> On Tue, 5 Jan 2016 18:43:02 +0200 >>> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: >>> >>>> On Tue, Jan 05, 2016 at 05:30:25PM +0100, Igor Mammedov wrote: >>>>>>> bios-linker-loader is a great interface for initializing some >>>>>>> guest owned data and linking it together but I think it adds >>>>>>> unnecessary complexity and is misused if it's used to handle >>>>>>> device owned data/on device memory in this and VMGID cases. >>>>>> >>>>>> I want a generic interface for guest to enumerate these things. linker >>>>>> seems quite reasonable but if you see a reason why it won't do, or want >>>>>> to propose a better interface, fine. >>>>>> >>>>>> PCI would do, too - though windows guys had concerns about >>>>>> returning PCI BARs from ACPI. >>>>> There were potential issues with pSeries bootloader that treated >>>>> PCI_CLASS_MEMORY_RAM as conventional RAM but it was fixed. >>>>> Could you point out to discussion about windows issues? >>>>> >>>>> What VMGEN patches that used PCI for mapping purposes were >>>>> stuck at, was that it was suggested to use PCI_CLASS_MEMORY_RAM >>>>> class id but we couldn't agree on it. >>>>> >>>>> VMGEN v13 with full discussion is here >>>>> https://patchwork.ozlabs.org/patch/443554/ >>>>> So to continue with this route we would need to pick some other >>>>> driver less class id so windows won't prompt for driver or >>>>> maybe supply our own driver stub to guarantee that no one >>>>> would touch it. Any suggestions? >>>> >>>> Pick any device/vendor id pair for which windows specifies no driver. >>>> There's a small risk that this will conflict with some >>>> guest but I think it's minimal. >>> device/vendor id pair was QEMU specific so doesn't conflicts with anything >>> issue we were trying to solve was to prevent windows asking for driver >>> even though it does so only once if told not to ask again. >>> >>> That's why PCI_CLASS_MEMORY_RAM was selected as it's generic driver-less >>> device descriptor in INF file which matches as the last resort if >>> there isn't any other diver that's matched device by device/vendor id pair. >> >> I think this is the only class in this inf. >> If you can't use it, you must use an existing device/vendor id pair, >> there's some risk involved but probably not much. > I can't wrap my head around this answer, could you rephrase it? > > As far as I see we can use PCI_CLASS_MEMORY_RAM with qemu's device/vendor ids. > In that case Windows associates it with dummy "Generic RAM controller". > > The same happens with some NVIDIA cards if NVIDIA drivers are not installed, > if we install drivers then Windows binds NVIDIA's PCI_CLASS_MEMORY_RAM with > concrete driver that manages VRAM the way NVIDIA wants it. > > So I think we can use it with low risk. > > If we use existing device/vendor id pair with some driver then driver > will fail to initialize and as minimum we would get device marked as > not working in Device-Manager. Any way if you have in mind a concrete > existing device/vendor id pair feel free to suggest it. > >> >>>> >>>> >>>>>> >>>>>> >>>>>>> There was RFC on list to make BIOS boot from NVDIMM already >>>>>>> doing some ACPI table lookup/parsing. Now if they were forced >>>>>>> to also parse and execute AML to initialize QEMU with guest >>>>>>> allocated address that would complicate them quite a bit. >>>>>> >>>>>> If they just need to find a table by name, it won't be >>>>>> too bad, would it? >>>>> that's what they were doing scanning memory for static NVDIMM table. >>>>> However if it were DataTable, BIOS side would have to execute >>>>> AML so that the table address could be told to QEMU. >>>> >>>> Not at all. You can find any table by its signature without >>>> parsing AML. >>> yep, and then BIOS would need to tell its address to QEMU >>> writing to IO port which is allocated statically in QEMU >>> for this purpose and is described in AML only on guest side. >> >> io ports are an ABI too but they are way easier to >> maintain. > It's pretty much the same as GPA addresses only it's much more limited resource. > Otherwise one has to do the same tricks to maintain ABI. > >> >>>> >>>> >>>>> In case of direct mapping or PCI BAR there is no need to initialize >>>>> QEMU side from AML. >>>>> That also saves us IO port where this address should be written >>>>> if bios-linker-loader approach is used. >>>>> >>>>>> >>>>>>> While with NVDIMM control memory region mapped directly by QEMU, >>>>>>> respective patches don't need in any way to initialize QEMU, >>>>>>> all they would need just read necessary data from control region. >>>>>>> >>>>>>> Also using bios-linker-loader takes away some usable RAM >>>>>>> from guest and in the end that doesn't scale, >>>>>>> the more devices I add the less usable RAM is left for guest OS >>>>>>> while all the device needs is a piece of GPA address space >>>>>>> that would belong to it. >>>>>> >>>>>> I don't get this comment. I don't think it's MMIO that is wanted. >>>>>> If it's backed by qemu virtual memory then it's RAM. >>>>> Then why don't allocate video card VRAM the same way and try to explain >>>>> user that a guest started with '-m 128 -device cirrus-vga,vgamem_mb=64Mb' >>>>> only has 64Mb of available RAM because of we think that on device VRAM >>>>> is also RAM. >>>>> >>>>> Maybe I've used MMIO term wrongly here but it roughly reflects the idea >>>>> that on device memory (whether it's VRAM, NVDIMM control block or VMGEN >>>>> area) is not allocated from guest's usable RAM (as described in E820) >>>>> but rather directly mapped in guest's GPA and doesn't consume available >>>>> RAM as guest sees it. That's also the way it's done on real hardware. >>>>> >>>>> What we need in case of VMGEN ID and NVDIMM is on device memory >>>>> that could be directly accessed by guest. >>>>> Both direct mapping or PCI BAR do that job and we could use simple >>>>> static AML without any patching. >>>> >>>> At least with VMGEN the issue is that there's an AML method >>>> that returns the physical address. >>>> Then if guest OS moves the BAR (which is legal), it will break >>>> since caller has no way to know it's related to the BAR. >>> I've found a following MS doc "Firmware Allocation of PCI Device Resources in Windows". It looks like when MS implemented resource rebalancing in >>> Vista they pushed a compat change to PCI specs. >>> That ECN is called "Ignore PCI Boot Configuration_DSM Function" >>> and can be found here: >>> https://pcisig.com/sites/default/files/specification_documents/ECR-Ignorebootconfig-final.pdf >>> >>> It looks like it's possible to forbid rebalancing per >>> device/bridge if it has _DMS method that returns "do not >>> ignore the boot configuration of PCI resources". >> >> I'll have to study this but we don't want that >> globally, do we? > no need to do it globally, adding _DSM to a device, we don't wish > to be rebalanced, should be sufficient to lock down specific resources. > > actually existence of spec implies that if there is a boot configured > device with resources described in ACPI table and there isn't _DSM > method enabling rebalancing for it, then rebalancing is not permitted. > It should be easy to make an experiment to verify what Windows would do. > > So if this approach would work and we agree on going with it, I could work > on redoing VMGENv13 series using _DSM as described. > That would simplify implementing this kind of devices vs bios-linker approach i.e.: > - free RAM occupied by linker blob > - free IO port > - avoid 2 or 3 layers of indirection - which makes understanding of code much easier > - avoid runtime AML patching and simplify AML and its composing parts > - there won't be need for BIOS to get IO port from fw_cfg and write > there GPA as well no need for table lookup. > - much easier to write unit tests, i.e. use the same qtest device testing > technique without necessity of running actual guest code. > i.e. no binary code blobs like we have for running bios-tables test in TCG mode. No objections on my part. If it works, it works for me! Laszlo > > >> This restricts hotplug functionality significantly. >> >>> >>>>>>>> >>>>>>>> See patch at the bottom that might be handy. >>>>>>>> >>>>>>>>> he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1: >>>>>>>>> | when writing ASL one shall make sure that only XP supported >>>>>>>>> | features are in global scope, which is evaluated when tables >>>>>>>>> | are loaded and features of rev2 and higher are inside methods. >>>>>>>>> | That way XP doesn't crash as far as it doesn't evaluate unsupported >>>>>>>>> | opcode and one can guard those opcodes checking _REV object if neccesary. >>>>>>>>> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html) >>>>>>>> >>>>>>>> Yes, this technique works. >>>>>>>> >>>>>>>> An alternative is to add an XSDT, XP ignores that. >>>>>>>> XSDT at the moment breaks OVMF (because it loads both >>>>>>>> the RSDT and the XSDT, which is wrong), but I think >>>>>>>> Laszlo was working on a fix for that. >>>>>>> Using XSDT would increase ACPI tables occupied RAM >>>>>>> as it would duplicate DSDT + non XP supported AML >>>>>>> at global namespace. >>>>>> >>>>>> Not at all - I posted patches linking to same >>>>>> tables from both RSDT and XSDT at some point. >>>>>> Only the list of pointers would be different. >>>>> if you put XP incompatible AML in separate SSDT and link it >>>>> only from XSDT than that would work but if incompatibility >>>>> is in DSDT, one would have to provide compat DSDT for RSDT >>>>> an incompat DSDT for XSDT. >>>> >>>> So don't do this. >>> well spec says "An ACPI-compatible OS must use the XSDT if present", >>> which I read as tables pointed by RSDT MUST be pointed by XSDT >>> as well and RSDT MUST NOT not be used. >>> >>> so if we put incompatible changes in a separate SSDT and put >>> it only in XSDT that might work. Showstopper here is OVMF which >>> has issues with it as Laszlo pointed out. >> >> But that's just a bug. >> >>> Also since Windows implements only subset of spec XSDT trick >>> would cover only XP based versions while the rest will see and >>> use XSDT pointed tables which still could have incompatible >>> AML with some of the later windows versions. >> >> We'll have to see what these are exactly. >> If it's methods in SSDT we can check the version supported >> by the ASPM. > I see only VAR_PACKAGE as such object so far. > > 64-bit PCI0._CRS probably won't crash 32-bit Vista and later as > it should be able to parse 64-bit Integers as defined by ACPI 2.0. > >> >>> >>>> >>>>> So far policy was don't try to run guest OS on QEMU >>>>> configuration that isn't supported by it. >>>> >>>> It's better if guests don't see some features but >>>> don't crash. It's not always possible of course but >>>> we should try to avoid this. >>>> >>>>> For example we use VAR_PACKAGE when running with more >>>>> than 255 VCPUs (commit b4f4d5481) which BSODs XP. >>>> >>>> Yes. And it's because we violate the spec, DSDT >>>> should not have this stuff. >>>> >>>>> So we can continue with that policy with out resorting to >>>>> using both RSDT and XSDT, >>>>> It would be even easier as all AML would be dynamically >>>>> generated and DSDT would only contain AML elements for >>>>> a concrete QEMU configuration. >>>> >>>> I'd prefer XSDT but I won't nack it if you do it in DSDT. >>>> I think it's not spec compliant but guests do not >>>> seem to care. >>>> >>>>>>> So far we've managed keep DSDT compatible with XP while >>>>>>> introducing features from v2 and higher ACPI revisions as >>>>>>> AML that is only evaluated on demand. >>>>>>> We can continue doing so unless we have to unconditionally >>>>>>> add incompatible AML at global scope. >>>>>>> >>>>>> >>>>>> Yes. >>>>>> >>>>>>>> >>>>>>>>> Michael, Paolo, what do you think about these ideas? >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> So using a patch below, we can add Name(PQRS, 0x0) at the top of the >>>>>>>> SSDT (or bottom, or add a separate SSDT just for that). It returns the >>>>>>>> current offset so we can add that to the linker. >>>>>>>> >>>>>>>> Won't work if you append the Name to the Aml structure (these can be >>>>>>>> nested to arbitrary depth using aml_append), so using plain GArray for >>>>>>>> this API makes sense to me. >>>>>>>> >>>>>>>> ---> >>>>>>>> >>>>>>>> acpi: add build_append_named_dword, returning an offset in buffer >>>>>>>> >>>>>>>> This is a very limited form of support for runtime patching - >>>>>>>> similar in functionality to what we can do with ACPI_EXTRACT >>>>>>>> macros in python, but implemented in C. >>>>>>>> >>>>>>>> This is to allow ACPI code direct access to data tables - >>>>>>>> which is exactly what DataTableRegion is there for, except >>>>>>>> no known windows release so far implements DataTableRegion. >>>>>>> unsupported means Windows will BSOD, so it's practically >>>>>>> unusable unless MS will patch currently existing Windows >>>>>>> versions. >>>>>> >>>>>> Yes. That's why my patch allows patching SSDT without using >>>>>> DataTableRegion. >>>>>> >>>>>>> Another thing about DataTableRegion is that ACPI tables are >>>>>>> supposed to have static content which matches checksum in >>>>>>> table the header while you are trying to use it for dynamic >>>>>>> data. It would be cleaner/more compatible to teach >>>>>>> bios-linker-loader to just allocate memory and patch AML >>>>>>> with the allocated address. >>>>>> >>>>>> Yes - if address is static, you need to put it outside >>>>>> the table. Can come right before or right after this. >>>>>> >>>>>>> Also if OperationRegion() is used, then one has to patch >>>>>>> DefOpRegion directly as RegionOffset must be Integer, >>>>>>> using variable names is not permitted there. >>>>>> >>>>>> I am not sure the comment was understood correctly. >>>>>> The comment says really "we can't use DataTableRegion >>>>>> so here is an alternative". >>>>> so how are you going to access data at which patched >>>>> NameString point to? >>>>> for that you'd need a normal patched OperationRegion >>>>> as well since DataTableRegion isn't usable. >>>> >>>> For VMGENID you would patch the method that >>>> returns the address - you do not need an op region >>>> as you never access it. >>>> >>>> I don't know about NVDIMM. Maybe OperationRegion can >>>> use the patched NameString? Will need some thought. >>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx> >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h >>>>>>>> index 1b632dc..f8998ea 100644 >>>>>>>> --- a/include/hw/acpi/aml-build.h >>>>>>>> +++ b/include/hw/acpi/aml-build.h >>>>>>>> @@ -286,4 +286,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre); >>>>>>>> void >>>>>>>> build_rsdt(GArray *table_data, GArray *linker, GArray *table_offsets); >>>>>>>> >>>>>>>> +int >>>>>>>> +build_append_named_dword(GArray *array, const char *name_format, ...); >>>>>>>> + >>>>>>>> #endif >>>>>>>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c >>>>>>>> index 0d4b324..7f9fa65 100644 >>>>>>>> --- a/hw/acpi/aml-build.c >>>>>>>> +++ b/hw/acpi/aml-build.c >>>>>>>> @@ -262,6 +262,32 @@ static void build_append_int(GArray *table, uint64_t value) >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> +/* Build NAME(XXXX, 0x0) where 0x0 is encoded as a qword, >>>>>>>> + * and return the offset to 0x0 for runtime patching. >>>>>>>> + * >>>>>>>> + * Warning: runtime patching is best avoided. Only use this as >>>>>>>> + * a replacement for DataTableRegion (for guests that don't >>>>>>>> + * support it). >>>>>>>> + */ >>>>>>>> +int >>>>>>>> +build_append_named_qword(GArray *array, const char *name_format, ...) >>>>>>>> +{ >>>>>>>> + int offset; >>>>>>>> + va_list ap; >>>>>>>> + >>>>>>>> + va_start(ap, name_format); >>>>>>>> + build_append_namestringv(array, name_format, ap); >>>>>>>> + va_end(ap); >>>>>>>> + >>>>>>>> + build_append_byte(array, 0x0E); /* QWordPrefix */ >>>>>>>> + >>>>>>>> + offset = array->len; >>>>>>>> + build_append_int_noprefix(array, 0x0, 8); >>>>>>>> + assert(array->len == offset + 8); >>>>>>>> + >>>>>>>> + return offset; >>>>>>>> +} >>>>>>>> + >>>>>>>> static GPtrArray *alloc_list; >>>>>>>> >>>>>>>> static Aml *aml_alloc(void) >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in >>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html