On Tue, Jun 14, 2022 at 04:47:35PM -0700, Keith Busch wrote: > On Tue, Jun 14, 2022 at 06:01:28PM -0500, Bjorn Helgaas wrote: > > [+cc NVMe folks] > > > > On Tue, Jun 14, 2022 at 07:49:27PM -0300, Guilherme G. Piccoli wrote: > > > On 14/06/2022 12:47, Hans de Goede wrote: > > > > [...] > > > > > > > > Have you looked at the log of the failed boot in the Steam Deck kernel > > > > bugzilla? Everything there seems to work just fine and then the system > > > > just hangs. I think that maybe it cannot find its root disk, so maybe > > > > an NVME issue ? > > > > > > *Exactly* that - NVMe device is the root disk, it cannot boot since the > > > device doesn't work, hence no rootfs =) > > > > Beginning of thread: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@xxxxxxxxxx > > > > Steam Deck broke because we erroneously trimmed out the PCI host > > bridge window where BIOS had placed most devices, successfully > > reassigned all the PCI bridge windows and BARs, but some devices, > > apparently including NVMe, didn't work at the new addresses. > > > > Do you NVMe folks know of gotchas in this area? I want to know > > because we'd like to be able to move devices around someday to > > make room for hot-added devices. > > > > This reassignment happened before drivers claimed the devices, so > > from a PCI point of view, I don't know why the NVMe device > > wouldn't work at the new address. > > The probe status quickly returns ENODEV. Based on the output (we > don't log much, so this is just an educated guesss), I think that > means the driver read all F's from the status register, which > indicates we can't read it when using the reassigned memory window. > > Why changing memory windows may not work tends to be platform or > device specific. Considering the renumbered windows didn't cause a > problem for other devices, it sounds like this nvme device may be > broken. It sounds like you've seen this sort of problem before, so we shouldn't assume that it's safe to reassign BARs. I think Windows supports rebalancing, but it does look like drivers have the ability to veto it: https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/stopping-a-device-to-rebalance-resources https://docs.microsoft.com/en-us/windows-hardware/drivers/wdf/the-pnp-manager-redistributes-system-resources So I suppose if/when we support rebalancing, it'll have to be an opt-in thing for each driver.