Ajay Kaher <akaher@xxxxxxxxxx> writes: > During boot-time there are many PCI config reads, these could be performed > either using Port IO instructions (PIO) or memory mapped I/O (MMIO). > > PIO are less efficient than MMIO, they require twice as many PCI accesses > and PIO instructions are serializing. As a result, MMIO should be preferred > when possible over PIO. > > Virtual Machine test result using VMware hypervisor > 1 hundred thousand reads using raw_pci_read() took: > PIO: 12.809 seconds > MMIO: 8.517 seconds (~33.5% faster then PIO) > > Currently, when these reads are performed by a virtual machine, they all > cause a VM-exit, and therefore each one of them induces a considerable > overhead. > > This overhead can be further improved, by mapping MMIO region of virtual > machine to memory area that holds the values that the “emulated hardware” > is supposed to return. The memory region is mapped as "read-only” in the > NPT/EPT, so reads from these regions would be treated as regular memory > reads. Writes would still be trapped and emulated by the hypervisor. > > Virtual Machine test result with above changes in VMware hypervisor > 1 hundred thousand read using raw_pci_read() took: > PIO: 12.809 seconds > MMIO: 0.010 seconds > > This helps to reduce virtual machine PCI scan and initialization time by > ~65%. In our case it reduced to ~18 mSec from ~55 mSec. > > MMIO is also faster than PIO on bare-metal systems, but due to some bugs > with legacy hardware and the smaller gains on bare-metal, it seems prudent > not to change bare-metal behavior. Out of curiosity, are we sure MMIO *always* works for other hypervisors besides Vmware? Various Hyper-V version can probably be tested (were they?) but with KVM it's much harder as PCI is emulated in VMM and there's certainly more than 1 in existence... > > Signed-off-by: Ajay Kaher <akaher@xxxxxxxxxx> > --- > v1 -> v2: > Limit changes to apply only to VMs [Matthew W.] > --- > arch/x86/pci/common.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 45 insertions(+) > > diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c > index ddb7986..1e5a8f7 100644 > --- a/arch/x86/pci/common.c > +++ b/arch/x86/pci/common.c > @@ -20,6 +20,7 @@ > #include <asm/pci_x86.h> > #include <asm/setup.h> > #include <asm/irqdomain.h> > +#include <asm/hypervisor.h> > > unsigned int pci_probe = PCI_PROBE_BIOS | PCI_PROBE_CONF1 | PCI_PROBE_CONF2 | > PCI_PROBE_MMCONF; > @@ -57,14 +58,58 @@ int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn, > return -EINVAL; > } > > +#ifdef CONFIG_HYPERVISOR_GUEST > +static int vm_raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn, > + int reg, int len, u32 *val) > +{ > + if (raw_pci_ext_ops) > + return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val); > + if (domain == 0 && reg < 256 && raw_pci_ops) > + return raw_pci_ops->read(domain, bus, devfn, reg, len, val); > + return -EINVAL; > +} > + > +static int vm_raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn, > + int reg, int len, u32 val) > +{ > + if (raw_pci_ext_ops) > + return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val); > + if (domain == 0 && reg < 256 && raw_pci_ops) > + return raw_pci_ops->write(domain, bus, devfn, reg, len, val); > + return -EINVAL; > +} These look exactly like raw_pci_read()/raw_pci_write() but with inverted priority. We could've added a parameter but to be more flexible, I'd suggest we add a 'priority' field to 'struct pci_raw_ops' and make raw_pci_read()/raw_pci_write() check it before deciding what to use first. To be on the safe side, you can leave raw_pci_ops's priority higher than raw_pci_ext_ops's by default and only tweak it in arch/x86/kernel/cpu/vmware.c > +#endif /* CONFIG_HYPERVISOR_GUEST */ > + > static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value) > { > +#ifdef CONFIG_HYPERVISOR_GUEST > + /* > + * MMIO is faster than PIO, but due to some bugs with legacy > + * hardware, it seems prudent to prefer MMIO for VMs and PIO > + * for bare-metal. > + */ > + if (!hypervisor_is_type(X86_HYPER_NATIVE)) > + return vm_raw_pci_read(pci_domain_nr(bus), bus->number, > + devfn, where, size, value); > +#endif /* CONFIG_HYPERVISOR_GUEST */ > + > return raw_pci_read(pci_domain_nr(bus), bus->number, > devfn, where, size, value); > } > > static int pci_write(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value) > { > +#ifdef CONFIG_HYPERVISOR_GUEST > + /* > + * MMIO is faster than PIO, but due to some bugs with legacy > + * hardware, it seems prudent to prefer MMIO for VMs and PIO > + * for bare-metal. > + */ > + if (!hypervisor_is_type(X86_HYPER_NATIVE)) > + return vm_raw_pci_write(pci_domain_nr(bus), bus->number, > + devfn, where, size, value); > +#endif /* CONFIG_HYPERVISOR_GUEST */ > + > return raw_pci_write(pci_domain_nr(bus), bus->number, > devfn, where, size, value); > } -- Vitaly