On Tue, Oct 04, 2022 at 06:48:11PM +0000, Nadav Amit wrote: > On Oct 4, 2022, at 1:22 AM, Alexander Graf <graf@xxxxxxxxxx> wrote: > > > ⚠ External Email > > > > Hey Nadav, > > > > On 03.10.22 19:34, Nadav Amit wrote: > >> On Oct 3, 2022, at 8:03 AM, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote: > >> > >>> Not my but rather PCI maintainer's call but IMHO dropping 'const' is > >>> better, introducing a new global var is our 'last resort' and should be > >>> avoided whenever possible. Alternatively, you can add a > >>> raw_pci_ext_ops_preferred() function checking somethin within 'struct > >>> hypervisor_x86' but I'm unsure if it's better. > >>> > >>> Also, please check Alex' question/suggestion. > >> Here is my take (and Ajay knows probably more than me): > >> > >> Looking briefly on MCFG, I do not see a clean way of using the ACPI table. > >> The two options are either to use a reserved field (which who knows, might > >> be used one day) or some OEM ID. I am also not familiar with > >> PCI_COMMAND.MEMORY=0, so Ajay can hopefully give some answer about that. > >> > >> Anyhow, I understand (although not relate) to the objection for a new global > >> variable. How about explicitly calling this hardware bug a “bug” and using > >> the proper infrastructure? Calling it explicitly a bug may even push whoever > >> can to resolve it. > > > > > > I am a lot more concerned with how we propagate it externally than > > within Linux. If we hard code that all Linux kernels 6.2+ that are > > running in VMware prefer ECAM over PIO, we lock ourselves into that > > stance for better or worse, which means: > > > > * All past and future versions of any VMware hypervisor product have to > > always allow ECAM access for any PCIe config space write > > * No other hypervisor benefits from any of this without upstream code change > > * No real hardware platform benefits from this without upstream code change > > > > By moving it into MCFG, we can create a path for the outside environment > > to tell the OS whether it's safe to use ECAM always. This obviously > > doesn't work with MCFG as it stands today, we'd have to propose an MCFG > > spec change to the PCI SIG's "PCI Firmware Specification" to add the > > respective field. Future VMware versions could then always expose the > > flag - and if you find it broken, remove it again. > > > > Putting all of the logic on which system potentially prefers ECAM over > > PIO config space access into Linux is just a big hack that we should > > avoid as much as possible. > > Thanks Alex. You raise important points. Let me try to break down your > concerns slightly differently: > > 1. Enabling MMIO access should be selective, and potentially controlled by > the hypervisor. The very least a "chicken-bit” is needed. > > 2. PCI SIG would change its specifications to address unclear hardware bug. > > I think (1) makes sense and we can discuss different ways of addressing it. > But (2) would not happen in a reasonable timeline and seems to me as an > unnecessary complication. > > But before we discuss how to address the issue, perhaps we need to first > understand it better. I am not sure that I understand this MMIO bug, and so > far nobody was able to provide exact details. > > So I went to have a look. It might not be super helpful, but for the record, > here is what I collected. > > First, we have commit d6ece5491ae71d ("i386/x86-64 Correct for broken MCFG > tables on K8 systems”). It tried to "try to discover all devices on bus 0 > that are unreachable using MM and fallback for them.” Interestingly, it > seems similar to FreeBSD code (commit 2d10570afe2b3e) that also mentions K8 > and has similar detection logic in FreeBSD’s pcie_cfgregopen(). > > Then commit a0ca9909609470 ("PCI x86: always use conf1 to access config > space below 256 bytes”). The correspondence [1] mentions some bugs: ATI > chipset, VIA chipset, Intel 3 Series Express chipset family and some reports > on Nvidia. It turned out some devices had problem probing - to figure out if > MMIO is broken - the way the previous patch did. There's also a statement by Linus that MCFG might not cover all buses in that thread. I didn't think the implications through yet ... > All of these bugs are circa 2008, of course. And note that FreeBSD did not > take a similar path. The correspondence around Linux patch is endless. I > admit that I did not understand whether eventually the issues were found to > be per-bus or per-device. > > > Back to the matter at hand. The benefit of using the MCFG approach that you > propose is that it can enable native systems to use MMIO as well. However, > since the list of bugs is unclear and the problems might be device-specific, > it is not clear what information BIOSes have that Linux doesn’t. In other > words, the benefit of getting it into the specifications is questionable, > and the complexity+time is high. > > Can we agree that the feature would be enabled explicitly by the hypervisor > and Linux would enable it based on the hypervisor input (through some > channel?) > > Thanks, > Nadav > > [1] https://lore.kernel.org/all/20080112144030.GA19279@xxxxxxxxxxxxxxxxxxxx/T/#u