Hi Smita, My apologies for the delay! On Mon, May 22, 2023 at 03:23:31PM -0700, Smita Koralahalli wrote: > On 5/11/2023 4:19 AM, Lukas Wunner wrote: > > On Tue, Apr 18, 2023 at 09:05:26PM +0000, Smita Koralahalli wrote: > > > Clear all capabilities in Device Control 2 register as they are optional > > > and it is not determined whether the next device inserted will support > > > these capabilities. Moreover, Section 6.13 of the PCIe Base > > > Specification [1], recommends clearing the ARI Forwarding Enable bit on > > > a hot-plug event as its not guaranteed that the newly added component > > > is in fact an ARI device. > > > > Clearing ARI Forwarding Enable sounds reasonable, but I don't see > > why all the other bits in the Device Control 2 register need to be > > cleared. If there isn't a reason to clear them, I'd be in favor of > > leaving them alone. > > I understand. The SPEC doesn't "clearly" state to clear the other bits > except ARI on a hot-plug event. > > But, we came across issues when a device with 10-bit tags was removed and > replaced with a device that didn't support 10-bit tags. The device became > inaccessible and the port was not able to be recovered without a system > reset. So, we thought it would be better to cherry pick all bits that were > negotiated between endpoint and root port and decided that we should clear > them all on removal. Hence, my first revision of this patch series had aimed > to clear only ARI, AtomicOp Requestor and 10 bit tags as these were the > negotiated settings between endpoint and root port. Ideally, these settings > should be re-negotiated and set up for optimal operation on a hot add. Makes total sense. I like the approach of clearing only these three bits, as you did in v1 of the patch. I also appreciate the detailed explanation that you've provided. Much of your e-mail can be copy-pasted to the commit message, in my opinion it's valuable information to any reviewer and future reader of the commit. > We had some internal discussions to understand if SPEC has it documented > somewhere. And we could see in Section 2.2.6.2, it implies that: > [i] If a Requester sends a 10-Bit Tag Request to a Completer that lacks > 10-Bit Completer capability, the returned Completion(s) will have Tags with > Tag[9:8] equal to 00b. Since the Requester is forbidden to generate these > Tag values for 10-Bit Tags, such Completions will be handled as Unexpected > Completions, which by default are Advisory Non-Fatal Errors. The Requester > must follow standard PCI Express error handling requirements. > [ii] In configurations where a Requester with 10-Bit Tag Requester > capability needs to target multiple Completers, one needs to ensure that the > Requester sends 10-Bit Tag Requests only to Completers that have 10-Bit Tag > Completer capability. > > Now, we might wonder, why clear them (especially 10-bit tags and AtomicOps) > if Linux hasn't enabled them at all as the "10-Bit Tag Requester Enable" bit > is not defined in Linux currently. But, these features might be enabled by > Platform FW for "performance reasons" if the endpoint supports and now it is > the responsibility of the operating system to disable it on a hot remove > event. Again, makes total sense. > According to implementation notes in 2.2.6.2: "For platforms where the RC > supports 10-Bit Tag Completer capability, it is highly recommended for > platform firmware or operating software that configures PCIe hierarchies to > Set the 10-Bit Tag Requester Enable bit automatically in Endpoints with > 10-Bit Tag Requester capability. This enables the important class of 10-Bit > Tag capable adapters that send Memory Read Requests only to host memory." So > if the endpoint and root port both support 10-bit tags BIOS is enabling it > at boot time.. > > I ran a quick check to see how DEV_CTL2 registers for root port look on a > 10-bit tag supported NVMe device. > > pci 0000:80:05.1: DEVCTL2 0x1726 (Bit 12: 10-bit tag is enabled..) > pci 0000:80:05.1: DEVCAP2 0x7f19ff > > So, it seems like BIOS has enabled 10-bit tags at boot time even though > Linux hasn't enabled it. > > Some couple of ways we think could be: > [1] Check if these bits are enabled by Platform at boot time, clear them > only it is set during hotplug flow. > [2] Clear them unconditionally as I did.. > [3] Enable 10-bits tags in Linux when a device is probed just like how we do > for ARI.. > > Similarly call pci_enable_atomic_ops_to_root() during a hot add.. Personally I'm fine with option [2]. If you or Bjorn prefer option [3], I'm fine with that as well. > > As for clearing ARI Forwarding Enable, it seems commit b0cc6020e1cc > > ("PCI: Enable ARI if dev and upstream bridge support it; disable > > otherwise") already solved this problem. Quoth its commit message: [...] > > My superficial understanding of that patch is that we do find function 0, > > while enumerating it we clear the ARI Forwarding Enable bit and thus the > > remaining functions become accessible and are subsequently enumerated. > > > > Are you seeing issues when removing an ARI-capable endpoint from a > > hotplug slot and replacing it with a non-ARI-capable device? > > If you do, the question is why the above-quoted commit doesn't avoid them. > > Yeah! Sorry I missed this. ARI is already checked and enabled during device > initialization. It doesn't hurt to additionally clear on hot-removal. Thanks, Lukas