On Thu, 7 Jan 2016 12:54:30 +0200 "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > On Thu, Jan 07, 2016 at 11:30:25AM +0100, Igor Mammedov wrote: > > On Tue, 5 Jan 2016 18:43:02 +0200 > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > > > > > On Tue, Jan 05, 2016 at 05:30:25PM +0100, Igor Mammedov wrote: > > > > > > bios-linker-loader is a great interface for initializing some > > > > > > guest owned data and linking it together but I think it adds > > > > > > unnecessary complexity and is misused if it's used to handle > > > > > > device owned data/on device memory in this and VMGID cases. > > > > > > > > > > I want a generic interface for guest to enumerate these things. linker > > > > > seems quite reasonable but if you see a reason why it won't do, or want > > > > > to propose a better interface, fine. > > > > > > > > > > PCI would do, too - though windows guys had concerns about > > > > > returning PCI BARs from ACPI. > > > > There were potential issues with pSeries bootloader that treated > > > > PCI_CLASS_MEMORY_RAM as conventional RAM but it was fixed. > > > > Could you point out to discussion about windows issues? > > > > > > > > What VMGEN patches that used PCI for mapping purposes were > > > > stuck at, was that it was suggested to use PCI_CLASS_MEMORY_RAM > > > > class id but we couldn't agree on it. > > > > > > > > VMGEN v13 with full discussion is here > > > > https://patchwork.ozlabs.org/patch/443554/ > > > > So to continue with this route we would need to pick some other > > > > driver less class id so windows won't prompt for driver or > > > > maybe supply our own driver stub to guarantee that no one > > > > would touch it. Any suggestions? > > > > > > Pick any device/vendor id pair for which windows specifies no driver. > > > There's a small risk that this will conflict with some > > > guest but I think it's minimal. > > device/vendor id pair was QEMU specific so doesn't conflicts with anything > > issue we were trying to solve was to prevent windows asking for driver > > even though it does so only once if told not to ask again. > > > > That's why PCI_CLASS_MEMORY_RAM was selected as it's generic driver-less > > device descriptor in INF file which matches as the last resort if > > there isn't any other diver that's matched device by device/vendor id pair. > > I think this is the only class in this inf. > If you can't use it, you must use an existing device/vendor id pair, > there's some risk involved but probably not much. I can't wrap my head around this answer, could you rephrase it? As far as I see we can use PCI_CLASS_MEMORY_RAM with qemu's device/vendor ids. In that case Windows associates it with dummy "Generic RAM controller". The same happens with some NVIDIA cards if NVIDIA drivers are not installed, if we install drivers then Windows binds NVIDIA's PCI_CLASS_MEMORY_RAM with concrete driver that manages VRAM the way NVIDIA wants it. So I think we can use it with low risk. If we use existing device/vendor id pair with some driver then driver will fail to initialize and as minimum we would get device marked as not working in Device-Manager. Any way if you have in mind a concrete existing device/vendor id pair feel free to suggest it. > > > > > > > > > > > > > > > > > > > > > > > There was RFC on list to make BIOS boot from NVDIMM already > > > > > > doing some ACPI table lookup/parsing. Now if they were forced > > > > > > to also parse and execute AML to initialize QEMU with guest > > > > > > allocated address that would complicate them quite a bit. > > > > > > > > > > If they just need to find a table by name, it won't be > > > > > too bad, would it? > > > > that's what they were doing scanning memory for static NVDIMM table. > > > > However if it were DataTable, BIOS side would have to execute > > > > AML so that the table address could be told to QEMU. > > > > > > Not at all. You can find any table by its signature without > > > parsing AML. > > yep, and then BIOS would need to tell its address to QEMU > > writing to IO port which is allocated statically in QEMU > > for this purpose and is described in AML only on guest side. > > io ports are an ABI too but they are way easier to > maintain. It's pretty much the same as GPA addresses only it's much more limited resource. Otherwise one has to do the same tricks to maintain ABI. > > > > > > > > > > > In case of direct mapping or PCI BAR there is no need to initialize > > > > QEMU side from AML. > > > > That also saves us IO port where this address should be written > > > > if bios-linker-loader approach is used. > > > > > > > > > > > > > > > While with NVDIMM control memory region mapped directly by QEMU, > > > > > > respective patches don't need in any way to initialize QEMU, > > > > > > all they would need just read necessary data from control region. > > > > > > > > > > > > Also using bios-linker-loader takes away some usable RAM > > > > > > from guest and in the end that doesn't scale, > > > > > > the more devices I add the less usable RAM is left for guest OS > > > > > > while all the device needs is a piece of GPA address space > > > > > > that would belong to it. > > > > > > > > > > I don't get this comment. I don't think it's MMIO that is wanted. > > > > > If it's backed by qemu virtual memory then it's RAM. > > > > Then why don't allocate video card VRAM the same way and try to explain > > > > user that a guest started with '-m 128 -device cirrus-vga,vgamem_mb=64Mb' > > > > only has 64Mb of available RAM because of we think that on device VRAM > > > > is also RAM. > > > > > > > > Maybe I've used MMIO term wrongly here but it roughly reflects the idea > > > > that on device memory (whether it's VRAM, NVDIMM control block or VMGEN > > > > area) is not allocated from guest's usable RAM (as described in E820) > > > > but rather directly mapped in guest's GPA and doesn't consume available > > > > RAM as guest sees it. That's also the way it's done on real hardware. > > > > > > > > What we need in case of VMGEN ID and NVDIMM is on device memory > > > > that could be directly accessed by guest. > > > > Both direct mapping or PCI BAR do that job and we could use simple > > > > static AML without any patching. > > > > > > At least with VMGEN the issue is that there's an AML method > > > that returns the physical address. > > > Then if guest OS moves the BAR (which is legal), it will break > > > since caller has no way to know it's related to the BAR. > > I've found a following MS doc "Firmware Allocation of PCI Device Resources in Windows". It looks like when MS implemented resource rebalancing in > > Vista they pushed a compat change to PCI specs. > > That ECN is called "Ignore PCI Boot Configuration_DSM Function" > > and can be found here: > > https://pcisig.com/sites/default/files/specification_documents/ECR-Ignorebootconfig-final.pdf > > > > It looks like it's possible to forbid rebalancing per > > device/bridge if it has _DMS method that returns "do not > > ignore the boot configuration of PCI resources". > > I'll have to study this but we don't want that > globally, do we? no need to do it globally, adding _DSM to a device, we don't wish to be rebalanced, should be sufficient to lock down specific resources. actually existence of spec implies that if there is a boot configured device with resources described in ACPI table and there isn't _DSM method enabling rebalancing for it, then rebalancing is not permitted. It should be easy to make an experiment to verify what Windows would do. So if this approach would work and we agree on going with it, I could work on redoing VMGENv13 series using _DSM as described. That would simplify implementing this kind of devices vs bios-linker approach i.e.: - free RAM occupied by linker blob - free IO port - avoid 2 or 3 layers of indirection - which makes understanding of code much easier - avoid runtime AML patching and simplify AML and its composing parts - there won't be need for BIOS to get IO port from fw_cfg and write there GPA as well no need for table lookup. - much easier to write unit tests, i.e. use the same qtest device testing technique without necessity of running actual guest code. i.e. no binary code blobs like we have for running bios-tables test in TCG mode. > This restricts hotplug functionality significantly. > > > > > > > > > > > > > > > > > See patch at the bottom that might be handy. > > > > > > > > > > > > > > > he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1: > > > > > > > > | when writing ASL one shall make sure that only XP supported > > > > > > > > | features are in global scope, which is evaluated when tables > > > > > > > > | are loaded and features of rev2 and higher are inside methods. > > > > > > > > | That way XP doesn't crash as far as it doesn't evaluate unsupported > > > > > > > > | opcode and one can guard those opcodes checking _REV object if neccesary. > > > > > > > > (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html) > > > > > > > > > > > > > > Yes, this technique works. > > > > > > > > > > > > > > An alternative is to add an XSDT, XP ignores that. > > > > > > > XSDT at the moment breaks OVMF (because it loads both > > > > > > > the RSDT and the XSDT, which is wrong), but I think > > > > > > > Laszlo was working on a fix for that. > > > > > > Using XSDT would increase ACPI tables occupied RAM > > > > > > as it would duplicate DSDT + non XP supported AML > > > > > > at global namespace. > > > > > > > > > > Not at all - I posted patches linking to same > > > > > tables from both RSDT and XSDT at some point. > > > > > Only the list of pointers would be different. > > > > if you put XP incompatible AML in separate SSDT and link it > > > > only from XSDT than that would work but if incompatibility > > > > is in DSDT, one would have to provide compat DSDT for RSDT > > > > an incompat DSDT for XSDT. > > > > > > So don't do this. > > well spec says "An ACPI-compatible OS must use the XSDT if present", > > which I read as tables pointed by RSDT MUST be pointed by XSDT > > as well and RSDT MUST NOT not be used. > > > > so if we put incompatible changes in a separate SSDT and put > > it only in XSDT that might work. Showstopper here is OVMF which > > has issues with it as Laszlo pointed out. > > But that's just a bug. > > > Also since Windows implements only subset of spec XSDT trick > > would cover only XP based versions while the rest will see and > > use XSDT pointed tables which still could have incompatible > > AML with some of the later windows versions. > > We'll have to see what these are exactly. > If it's methods in SSDT we can check the version supported > by the ASPM. I see only VAR_PACKAGE as such object so far. 64-bit PCI0._CRS probably won't crash 32-bit Vista and later as it should be able to parse 64-bit Integers as defined by ACPI 2.0. > > > > > > > > > > So far policy was don't try to run guest OS on QEMU > > > > configuration that isn't supported by it. > > > > > > It's better if guests don't see some features but > > > don't crash. It's not always possible of course but > > > we should try to avoid this. > > > > > > > For example we use VAR_PACKAGE when running with more > > > > than 255 VCPUs (commit b4f4d5481) which BSODs XP. > > > > > > Yes. And it's because we violate the spec, DSDT > > > should not have this stuff. > > > > > > > So we can continue with that policy with out resorting to > > > > using both RSDT and XSDT, > > > > It would be even easier as all AML would be dynamically > > > > generated and DSDT would only contain AML elements for > > > > a concrete QEMU configuration. > > > > > > I'd prefer XSDT but I won't nack it if you do it in DSDT. > > > I think it's not spec compliant but guests do not > > > seem to care. > > > > > > > > > So far we've managed keep DSDT compatible with XP while > > > > > > introducing features from v2 and higher ACPI revisions as > > > > > > AML that is only evaluated on demand. > > > > > > We can continue doing so unless we have to unconditionally > > > > > > add incompatible AML at global scope. > > > > > > > > > > > > > > > > Yes. > > > > > > > > > > > > > > > > > > > > Michael, Paolo, what do you think about these ideas? > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > > > So using a patch below, we can add Name(PQRS, 0x0) at the top of the > > > > > > > SSDT (or bottom, or add a separate SSDT just for that). It returns the > > > > > > > current offset so we can add that to the linker. > > > > > > > > > > > > > > Won't work if you append the Name to the Aml structure (these can be > > > > > > > nested to arbitrary depth using aml_append), so using plain GArray for > > > > > > > this API makes sense to me. > > > > > > > > > > > > > > ---> > > > > > > > > > > > > > > acpi: add build_append_named_dword, returning an offset in buffer > > > > > > > > > > > > > > This is a very limited form of support for runtime patching - > > > > > > > similar in functionality to what we can do with ACPI_EXTRACT > > > > > > > macros in python, but implemented in C. > > > > > > > > > > > > > > This is to allow ACPI code direct access to data tables - > > > > > > > which is exactly what DataTableRegion is there for, except > > > > > > > no known windows release so far implements DataTableRegion. > > > > > > unsupported means Windows will BSOD, so it's practically > > > > > > unusable unless MS will patch currently existing Windows > > > > > > versions. > > > > > > > > > > Yes. That's why my patch allows patching SSDT without using > > > > > DataTableRegion. > > > > > > > > > > > Another thing about DataTableRegion is that ACPI tables are > > > > > > supposed to have static content which matches checksum in > > > > > > table the header while you are trying to use it for dynamic > > > > > > data. It would be cleaner/more compatible to teach > > > > > > bios-linker-loader to just allocate memory and patch AML > > > > > > with the allocated address. > > > > > > > > > > Yes - if address is static, you need to put it outside > > > > > the table. Can come right before or right after this. > > > > > > > > > > > Also if OperationRegion() is used, then one has to patch > > > > > > DefOpRegion directly as RegionOffset must be Integer, > > > > > > using variable names is not permitted there. > > > > > > > > > > I am not sure the comment was understood correctly. > > > > > The comment says really "we can't use DataTableRegion > > > > > so here is an alternative". > > > > so how are you going to access data at which patched > > > > NameString point to? > > > > for that you'd need a normal patched OperationRegion > > > > as well since DataTableRegion isn't usable. > > > > > > For VMGENID you would patch the method that > > > returns the address - you do not need an op region > > > as you never access it. > > > > > > I don't know about NVDIMM. Maybe OperationRegion can > > > use the patched NameString? Will need some thought. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx> > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h > > > > > > > index 1b632dc..f8998ea 100644 > > > > > > > --- a/include/hw/acpi/aml-build.h > > > > > > > +++ b/include/hw/acpi/aml-build.h > > > > > > > @@ -286,4 +286,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre); > > > > > > > void > > > > > > > build_rsdt(GArray *table_data, GArray *linker, GArray *table_offsets); > > > > > > > > > > > > > > +int > > > > > > > +build_append_named_dword(GArray *array, const char *name_format, ...); > > > > > > > + > > > > > > > #endif > > > > > > > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c > > > > > > > index 0d4b324..7f9fa65 100644 > > > > > > > --- a/hw/acpi/aml-build.c > > > > > > > +++ b/hw/acpi/aml-build.c > > > > > > > @@ -262,6 +262,32 @@ static void build_append_int(GArray *table, uint64_t value) > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > +/* Build NAME(XXXX, 0x0) where 0x0 is encoded as a qword, > > > > > > > + * and return the offset to 0x0 for runtime patching. > > > > > > > + * > > > > > > > + * Warning: runtime patching is best avoided. Only use this as > > > > > > > + * a replacement for DataTableRegion (for guests that don't > > > > > > > + * support it). > > > > > > > + */ > > > > > > > +int > > > > > > > +build_append_named_qword(GArray *array, const char *name_format, ...) > > > > > > > +{ > > > > > > > + int offset; > > > > > > > + va_list ap; > > > > > > > + > > > > > > > + va_start(ap, name_format); > > > > > > > + build_append_namestringv(array, name_format, ap); > > > > > > > + va_end(ap); > > > > > > > + > > > > > > > + build_append_byte(array, 0x0E); /* QWordPrefix */ > > > > > > > + > > > > > > > + offset = array->len; > > > > > > > + build_append_int_noprefix(array, 0x0, 8); > > > > > > > + assert(array->len == offset + 8); > > > > > > > + > > > > > > > + return offset; > > > > > > > +} > > > > > > > + > > > > > > > static GPtrArray *alloc_list; > > > > > > > > > > > > > > static Aml *aml_alloc(void) > > > > > > > > > > > > > > > > > > > -- > > > > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html