> On 13/09/22, 7:05 PM, "Vitaly Kuznetsov" <vkuznets@xxxxxxxxxx> wrote: >> >> Thanks Vitaly for your response. >> >> 1. we have multiple objects of struct pci_raw_ops, 2. adding 'priority' field to struct pci_raw_ops >> doesn't seems to be appropriate as need to take decision which object of struct pci_raw_ops has >> to be used, not something with-in struct pci_raw_ops. > > I'm not sure I follow, you have two instances of 'struct pci_raw_ops' > which are called 'raw_pci_ops' and 'raw_pci_ext_ops'. What if you do > something like (completely untested): > > diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h > index 70533fdcbf02..fb8270fa6c78 100644 > --- a/arch/x86/include/asm/pci_x86.h > +++ b/arch/x86/include/asm/pci_x86.h > @@ -116,6 +116,7 @@ extern void (*pcibios_disable_irq)(struct pci_dev *dev); > extern bool mp_should_keep_irq(struct device *dev); > > struct pci_raw_ops { > + int rating; > int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn, > int reg, int len, u32 *val); > int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn, > diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c > index ddb798603201..e9965fd11576 100644 > --- a/arch/x86/pci/common.c > +++ b/arch/x86/pci/common.c > @@ -40,7 +40,8 @@ const struct pci_raw_ops *__read_mostly raw_pci_ext_ops; > int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn, > int reg, int len, u32 *val) > { > - if (domain == 0 && reg < 256 && raw_pci_ops) > + if (domain == 0 && reg < 256 && raw_pci_ops && > + (!raw_pci_ext_ops || raw_pci_ext_ops->rating <= raw_pci_ops->rating)) > return raw_pci_ops->read(domain, bus, devfn, reg, len, val); > if (raw_pci_ext_ops) > return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val); > @@ -50,7 +51,8 @@ int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn, > int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn, > int reg, int len, u32 val) > { > - if (domain == 0 && reg < 256 && raw_pci_ops) > + if (domain == 0 && reg < 256 && raw_pci_ops && > + (!raw_pci_ext_ops || raw_pci_ext_ops->rating <= raw_pci_ops->rating)) > return raw_pci_ops->write(domain, bus, devfn, reg, len, val); > if (raw_pci_ext_ops) > return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val); > > and then somewhere in Vmware hypervisor initialization code > (arch/x86/kernel/cpu/vmware.c) you do > > raw_pci_ext_ops->rating = 100; Thanks Vitaly, for your review and helping us to improve the code. I was working to make changes as you suggested, but before sending v3 would like to discuss on following: If we add rating with-in struct pci_raw_ops then we can't have pci_mmcfg as const, and following change is must in arch/x86/pci/mmconfig_64.c: -const struct pci_raw_ops pci_mmcfg = { +struct pci_raw_ops pci_mmcfg = { .read = pci_mmcfg_read, .write = pci_mmcfg_write, }; So to avoid this change, is it fine to have global bool prefer_raw_pci_ext_ops? And raw_pci_read() will have following change: - if (domain == 0 && reg < 256 && raw_pci_ops) + if (domain == 0 && reg < 256 && raw_pci_ops && + (!prefer_raw_pci_ext_ops || !raw_pci_ext_ops) > > why wouldn't it work? > > (diclaimer: completely untested, raw_pci_ops/raw_pci_ext_ops > initialization has to be checked so 'rating' is not garbage). > >> >> It's a generic solution for all hypervisor (sorry for earlier wrong >> Subject), not specific to VMware. Further looking for feedback if it's >> impacting to any hypervisor. > > That's the tricky part. We can check modern hypervisor versions, but > what about all other versions in existence? How can we know that there's > no QEMU/Hyper-V/... version out there where MMIO path is broken? I'd > suggest we limit the change to Vmware hypervisor, other hypervisors may > use the same mechanism (like the one above) later (but the person > suggesting the patch is always responsible for the research why it is > safe to do so). Ok, as of now we will make this change specific to VMware hypervisor. - Ajay