On Tue, 2021-10-05 at 15:11 -0500, Bjorn Helgaas wrote: > On Mon, Oct 04, 2021 at 08:51:06PM +0000, Kelvin.Cao@xxxxxxxxxxxxx > wrote: > > On Sat, 2021-10-02 at 10:11 -0500, Bjorn Helgaas wrote: > > > On Fri, Oct 01, 2021 at 11:49:18PM +0000, > > > Kelvin.Cao@xxxxxxxxxxxxx > > > wrote: > > > > On Fri, 2021-10-01 at 14:29 -0600, Logan Gunthorpe wrote: > > > > > EXTERNAL EMAIL: Do not click links or open attachments unless > > > > > you > > > > > know the content is safe > > > > > > > > > > On 2021-10-01 2:18 p.m., Bjorn Helgaas wrote: > > > > > > On Fri, Sep 24, 2021 at 11:08:38AM +0000, > > > > > > kelvin.cao@xxxxxxxxxxxxx > > > > > > wrote: > > > > > > > From: Kelvin Cao <kelvin.cao@xxxxxxxxxxxxx> > > > > > > > > > > > > > > After a firmware hard reset, MRPC command executions, > > > > > > > which are based on the PCI BAR (which Microchip refers to > > > > > > > as GAS) read/write, will hang indefinitely. This is > > > > > > > because after a reset, the host will fail all GAS reads > > > > > > > (get all 1s), in which case the driver won't get a valid > > > > > > > MRPC status. > > > > > > > > > > > > Trying to write a merge commit log for this, but having a > > > > > > hard time summarizing it. It sounds like it covers both > > > > > > Switchtec-specific (firmware and MRPC commands) and generic > > > > > > PCIe behavior (MMIO read failures). > > > > > > > > > > > > This has something to do with a firmware hard reset. What > > > > > > is that? Is that like a firmware reboot? A device reset, > > > > > > e.g., FLR or secondary bus reset, that causes a firmware > > > > > > reboot? A device reset initiated by firmware? > > > > > > > > A firmware reset can be triggered by a reset command issued to > > > > the firmware to reboot it. > > > > > > So I guess this reset command was issued by the driver? > > > > Yes, the reset command can be issued by a userspace utility to the > > firmware via the driver. In some other cases, user can also issue > > the reset command, via a sideband interface (like UART), to the > > firmware. > > OK, thanks. That means CRS is not a factor here because this is not > an FLR or similar reset. > > > > > > > Anyway, apparently when that happens, MMIO reads to the > > > > > > switch > > > > > > fail (timeout or error completion on PCIe) for a while. If > > > > > > a > > > > > > device reset is involved, that much is standard PCIe > > > > > > behavior. > > > > > > And the driver sees ~0 data from those failed > > > > > > reads. That's > > > > > > not > > > > > > part of the PCIe spec, but is typical root complex > > > > > > behavior. > > > > > > > > > > > > But you said the MRPC commands hang > > > > > > indefinitely. Presumably > > > > > > MMIO reads would start succeeding eventually when the > > > > > > device > > > > > > becomes ready, so I don't know how that translates to > > > > > > "indefinitely." > > > > > > > > > > I suspect Kelvin can expand on this and fix the issue below. > > > > > But > > > > > in my experience, the MMIO will read ~0 forever after a > > > > > firmware > > > > > reset, until the system is rebooted. Presumably on systems > > > > > that > > > > > have good hot plug support they are supposed to recover. > > > > > Though > > > > > I've never seen that. > > > > > > > > This is also my observation, all MMIO read will fail (~0 > > > > returned) > > > > until the system is rebooted or a PCI rescan is performed. > > > > > > This made sense until you said MMIO reads would start working > > > after a > > > PCI rescan. A rescan doesn't really do anything special other > > > than > > > doing config accesses to the device. Two things come to mind: > > > > > > 1) Rescan does a config read of the Vendor ID, and devices may > > > respond with "Configuration Request Retry Status" if they are not > > > ready. In this event, Linux retries this for a while. This > > > scenario > > > doesn't quite fit because it sounds like this is a device- > > > specific > > > reset initiated by the driver, and CRS is not permited in this > > > case. > > > PCIe r5.0, sec 2.3.1, says: > > > > > > A device Function is explicitly not permitted to return CRS > > > following a software-initiated reset (other than an FLR) of the > > > device, e.g., by the device's software driver writing to a > > > device-specific reset bit. > > > > > > 2) The device may lose its bus and device number configuration > > > after a reset, so it must capture bus and device numbers from > > > config writes. I don't think Linux does this explicitly, but a > > > rescan does do config writes, which could accidentally fix > > > something (PCIe r5.0, sec 2.2.9). > > > > Thanks Bjorn. It makes perfect sense! > > > > > The MMIO read that signals the MRPC status always returns ~0 > > > > > and the userspace request will eventually time out. > > > > > > > > The problem in this case is that, in DMA MRPC mode, the status > > > > (in host memory) is always initialized to 'in progress', and > > > > it's up to the firmware to update it to 'done' after the > > > > command > > > > is executed in the firmware. After a firmware reset is > > > > performed, the firmware cannot be triggered to start a MRPC > > > > command, therefore the status in host memory remains 'in > > > > progress' in the driver, which prevents a MRPC from timing out. > > > > I should have included this in the message. > > > > > > I *thought* the problem was that the PCIe Memory Read failed and > > > the Root Complex fabricated ~0 data to complete the CPU > > > read. But > > > now I'm not sure, because it sounds like it might be that the > > > PCIe > > > transaction succeeds, but it reads data that hasn't been updated > > > by the firmware, i.e., it reads 'in progress' because firmware > > > hasn't updated it to 'done'. > > > > The original message was sort of misleading. After a firmware > > reset, > > CPU getting ~0 for the PCIe Memory Read doesn't explain the hang. > > In > > a MRPC execution (DMA MRPC mode), the MRPC status which is located > > in the host memory, gets initialized by the CPU and > > updated/finalized by the firmware. In the situation of a firmware > > reset, any MRPC initiated afterwards will not get the status > > updated > > by the firmware per the reason you pointed out above (or similar, > > to > > my understanding, firmware can no longer DMA data to host memory in > > such cases), therefore the MRPC execution will never end. > > I'm glad this makes sense to you, because it still doesn't to me. > > check_access() does an MMIO read to something in BAR0. If that read > returns ~0, it means either the PCIe Memory Read was successful and > the Switchtec device supplied ~0 data (maybe because firmware has not > initialized that part of the BAR) or the PCIe Memory Read failed and > the root complex fabricated the ~0 data. > > I'd like to know which one is happening so we can clarify the commit > log text about "MRPC command executions hang indefinitely" and "host > wil fail all GAS reads." It's not clear whether these are PCIe > protocol issues or driver/firmware interaction issues. > > Bjorn I think it's the latter case, the ~0 data was fabricated by the root complex, as the MMIO read in check_access() always returns ~0 until a reboot or a rescan happens. Kelvin