Thanks for the suggestions! > The calling convention of __pci_read_base() is already changing if we're having the caller disable decoding The way I implemented that in my initial patch draft[0] still allows for __pci_read_base() to be called independently, as it was originally, since (as far as I understand) the encode disable/enable is just a mask - so I didn't need to remove the disable/enable inside __pci_read_base(), and instead just added an extra one in pci_read_bases(), turning the __pci_read_base() disable/enable into a no-op when called from pci_read_bases(). In any case... > I think maybe another alternative that doesn't hold off the console would be to split the BAR sizing and resource processing into separate steps. This seems like a potentially better option, so I'll dig into that approach. Providing some additional info you requested last week, just for more context: > Do you have similar logs from that [hotplug] operation Attached [1] is the guest boot output (boot was quick, since no GPUs were attached at boot time) Here is what I see when I enable a single GPU (takes 1-3 seconds): [Dec 3 22:53] pci 0000:09:00.0: [10de:2330] type 00 class 0x030200 PCIe Endpoint [ +0.000526] pci 0000:09:00.0: BAR 0 [mem 0x00000000-0x00ffffff 64bit pref] [ +0.000237] pci 0000:09:00.0: BAR 2 [mem 0x00000000-0x1fffffffff 64bit pref] [ +0.000212] pci 0000:09:00.0: BAR 4 [mem 0x00000000-0x01ffffff 64bit pref] [ +0.000275] pci 0000:09:00.0: Max Payload Size set to 128 (was 256, max 256) [ +0.000453] pci 0000:09:00.0: Enabling HDA controller [ +0.003052] pci 0000:09:00.0: 252.048 Gb/s available PCIe bandwidth, limited by 16.0 GT/s PCIe x16 link at 0000:00:02.0 (capable of 504.112 Gb/s with 32.0 GT/s PCIe x16 link) [ +0.003327] pci 0000:09:00.0: BAR 2 [mem 0x384000000000-0x385fffffffff 64bit pref]: assigned [ +0.000258] pci 0000:09:00.0: BAR 4 [mem 0x386000000000-0x386001ffffff 64bit pref]: assigned [ +0.000311] pci 0000:09:00.0: BAR 0 [mem 0x386002000000-0x386002ffffff 64bit pref]: assigned [ +0.000993] nvidia 0000:09:00.0: enabling device (0140 -> 0142) And below[2] is the output of /proc/iomem after I have done that process 4 times (GPUs are 0000:08, 0000:09, 0000:0a, 0000:0b) [0] https://raw.githubusercontent.com/MitchellAugustin/script-dump-public/refs/heads/main/slow-ovmf-case/patches/0001-Move-decode-disable-enable-up-one-level-and-add-kern.patch [1] https://pastebin.ubuntu.com/p/T4myPSQSJ5/ [2] https://pastebin.ubuntu.com/p/GD2WtkW9Gq/ Other info: Guest: ubuntu@testbox02:~$ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-6.12.1 root=UUID=fec1c9ae-0df3-419c-80dd-f3035049b845 ro console=tty1 console=ttyS0 Host (DGX H100): $ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-6.12.0+ root=UUID=bb04f707-1c00-4806-8adb-cf9eb876786f ro console=ttyS0,115200n8 iommu=pt init_on_alloc=0 - Mitchell Augustin On Tue, Dec 3, 2024 at 4:06 PM Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > > On Tue, 3 Dec 2024 14:33:10 -0600 > Mitchell Augustin <mitchell.augustin@xxxxxxxxxxxxx> wrote: > > > Thanks. > > > > I'm thinking about the cleanest way to accomplish this: > > > > 1. I'm wondering if replacing the pci_info() calls with equivalent > > printk_deferred() calls might be sufficient here. This works in my > > initial test, but I'm not sure if this is definitive proof that we > > wouldn't have any issues in all deployments, or if my configuration is > > just not impacted by this kind of deadlock. > > Just switching to printk_deferred() alone seems like wishful thinking, > but if you were also to wrap the code in console_{un}lock(), that might > be a possible low-impact solution. > > > 2. I did also draft a patch that would just eliminate the redundancy > > and disable the impacted logs by default, and allow them to be > > re-enabled with a new kernel command line option > > "pci=bar_logging_enabled" (at the cost of the performance gains due to > > reduced redundancy). This works well in all of my tests. > > I suspect Bjorn would prefer not to add yet another pci command line > option and as we've seen here, the logs are useful by default. > > > Do you think either of those approaches would work / be appropriate? > > Ultimately I am trying to avoid messy changes that would require > > actually propagating all of the info needed for these logs back up to > > pci_read_bases(), if at all possible, since there seems like no > > obvious way to do that without changing the signature of > > __pci_read_base() or tracking additional state. > > The calling convention of __pci_read_base() is already changing if > we're having the caller disable decoding and it doesn't have a lot of > callers, so I don't think I'd worry about changing the signature. > > I think maybe another alternative that doesn't hold off the console > would be to split the BAR sizing and resource processing into separate > steps. For example pci_read_bases() might pass arrays like: > > u32 bars[PCI_STD_NUM_BARS] = { 0 }; > u32 romsz = 0; > > To a function like: > > void __pci_read_bars(struct pci_dev *dev, u32 *bars, u32 *romsz, > int num_bars, int rom) > { > u16 orig_cmd; > u32 tmp; > int i; > > if (!dev->mmio_always_on) { > pci_read_config_word(dev, PCI_COMMAND, &orig_cmd); > if (orig_cmd & PCI_COMMAND_DECODE_ENABLE) { > pci_write_config_word(dev, PCI_COMMAND, > orig_cmd & ~PCI_COMMAND_DECODE_ENABLE); > } > } > > for (i = 0; i < num_bars; i++) { > unsigned int pos = PCI_BASE_ADDRESS_0 + (i << 2); > > pci_read_config_dword(dev, pos, &tmp); > pci_write_config_dword(dev, pos, ~0); > pci_read_config_dword(dev, pos, &bars[i]); > pci_write_config_dword(dev, pos, tmp); > } > > if (rom) { > pci_read_config_dword(dev, rom, &tmp); > pci_write_config_dword(dev, rom, PCI_ROM_ADDRESS_MASK); > pci_read_config_dword(dev, rom, romsz); > pci_write_config_dword(dev, rom, tmp); > } > > if (!dev->mmio_always_on && (orig_cmd & PCI_COMMAND_DECODE_ENABLE)) > pci_write_config_word(dev, PCI_COMMAND, orig_cmd); > } > > pci_read_bases() would then iterate in a similar way that it does now, > passing pointers to the stashed data to __pci_read_base(), which would > then only do the resource processing and could freely print. > > To me that seems better than blocking the console... Maybe there are > other ideas on the list. Thanks, > > Alex > -- Mitchell Augustin Software Engineer - Ubuntu Partner Engineering Email:mitchell.augustin@xxxxxxxxxxxxx Location:United States of America canonical.com ubuntu.com