On Jul 8, 2022, at 5:56 AM, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > And looking at the results above, it's not so much the PIO vs MMIO > that makes a difference, it's the virtualisation. A mmio access goes > from 269ns to 85us. Rather than messing around with preferring MMIO > over PIO for config space, having an "enlightenment" to do config > space accesses would be a more profitable path. I am unfamiliar with the motivation for this patch, but I just wanted to briefly regard the advice about enlightments. “enlightenment”, AFAIK, is Microsoft’s term for "para-virtualization", so let’s regard the generic term. I think that you consider the bare-metal results as the possible results from a paravirtual machine, which is mostly wrong. Para-virtualization usually still requires a VM-exit and for the most part the hypervisor/host runs similar code for MMIO/hypercall (conceptually; the code of paravirtual and fully-virtual devices is often different, but IIUC, this is not what Ajay measured). Para-virtualization could have *perhaps* helped to reduce the number of PIO/MMIO and improve performance this way. If, for instance, all the PIO/MMIO are done during initialization, a paravirtual interface can be use to batch them together, and that would help. But it is more complicated to get a performance benefit from paravirtualization if the PIO/MMIO accesses are “spread”, for instance, done after each interrupt. Para-virtauilzation and full-virtualization both have pros and cons. Para-virtualization is many times more efficient, but requires the VM to have dedicated device drivers for the matter. Try to run a less-common OS than Linux and it would not work since the OS would not have drivers for the paras-virtual devices. And even if you add support today for a para-virtual devices, there are many deployed OSes that do not have such support, and you would not be able to run them in a VM. Regardless to virtualization, Ajay’s results show PIO is slower on bare-metal, and according to his numbers by 165ns, which is significant. Emulating PIO in hypervisors on x86 is inherently more complex than MMIO, so the results he got would most likely happen on all hypervisors. tl;dr: Let’s keep this discussion focused and put paravirtualization aside. It is not a solution for all the problems in the world.