On 7/9/2020 2:16 AM, Bjorn Helgaas wrote:
On Sun, Jul 08, 2040 at 11:22:12AM +0300, Aya Levin wrote:
On 7/6/2020 10:49 PM, David Miller wrote:
From: Aya Levin <ayal@xxxxxxxxxxxx>
Date: Mon, 6 Jul 2020 16:00:59 +0300
Assuming the discussions with Bjorn will conclude in a well-trusted
API that ensures relaxed ordering in enabled, I'd still like a method
to turn off relaxed ordering for performance debugging sake.
Bjorn highlighted the fact that the PCIe sub system can only offer a
query method. Even if theoretically a set API will be provided, this
will not fit a netdev debugging - I wonder if CPU vendors even support
relaxed ordering set/unset...
On the driver's side relaxed ordering is an attribute of the mkey and
should be available for configuration (similar to number of CPU
vs. number of channels).
Based on the above, and binding the driver's default relaxed ordering
to the return value from pcie_relaxed_ordering_enabled(), may I
continue with previous direction of a private-flag to control the
client side (my driver) ?
I don't like this situation at all.
If RO is so dodgy that it potentially needs to be disabled, that is
going to be an issue not just with networking devices but also with
storage and other device types as well.
Will every device type have a custom way to disable RO, thus
inconsistently, in order to accomodate this?
That makes no sense and is a terrible user experience.
That's why the knob belongs generically in PCI or similar.
Hi Bjorn,
Mellanox NIC supports relaxed ordering operation over DMA buffers.
However for debug prepossess we must have a chicken bit to disable
relaxed ordering on a specific system without effecting others in
run-time. In order to meet this requirement, I added a netdev
private-flag to ethtool for set RO API.
Dave raised a concern regarding embedding relaxed ordering set API
per system (networking, storage and others). We need the ability to
manage relaxed ordering in a unify manner. Could you please define a
PCI sub-system solution to meet this requirement?
I agree, this is definitely a mess. Let me just outline what I think
we have today and what we're missing.
- On the hardware side, device use of Relaxed Ordering is controlled
by the Enable Relaxed Ordering bit in the PCIe Device Control
register (or the PCI-X Command register). If set, the device is
allowed but not required to set the Relaxed Ordering bit in
transactions it initiates (PCIe r5.0, sec 7.5.3.4; PCI-X 2.0, sec
7.2.3).
I suspect there may be device-specific controls, too, because [1]
claims to enable/disable Relaxed Ordering but doesn't touch the
PCIe Device Control register. Device-specific controls are
certainly allowed, but of course it would be up to the driver, and
the device cannot generate TLPs with Relaxed Ordering unless the
architected PCIe Enable Relaxed Ordering bit is *also* set.
- Platform firmware can enable Relaxed Ordering for a device either
before handoff to the OS or via the _HPX ACPI method.
- The PCI core never enables Relaxed Ordering itself except when
applying _HPX.
- At enumeration-time, the PCI core disables Relaxed Ordering in
pci_configure_relaxed_ordering() if the device is below a Root
Port that has a quirk indicating an erratum. This quirk currently
includes many Intel Root Ports, but not all, and is an ongoing
maintenance problem.
- The PCI core provides pcie_relaxed_ordering_enabled() which tells
you whether Relaxed Ordering is enabled. Only used by cxgb4 and
csio, which use that information to fill in Ingress Queue
Commands.
- The PCI core does not provide a driver interface to enable or
disable Relaxed Ordering.
- Some drivers disable Relaxed Ordering themselves: mtip32xx,
netup_unidvb, tg3, myri10ge (oddly, only if CONFIG_MYRI10GE_DCA),
tsi721, kp2000_pcie.
- Some drivers enable Relaxed Ordering themselves: niu, tegra.
What are we missing and what should the PCI core do?
- Currently the Enable Relaxed Ordering bit depends on what firmware
did. Maybe the PCI core should always clear it during
enumeration?
- The PCI core should probably have a driver interface like
pci_set_relaxed_ordering(dev, enable) that can set or clear the
architected PCI-X or PCIe Enable Relaxed Ordering bit.
- Maybe there should be a kernel command-line parameter like
"pci=norelax" that disables Relaxed Ordering for every device and
prevents pci_set_relaxed_ordering() from enabling it.
I'm mixed on this because these tend to become folklore about how
to "fix" problems and we end up with systems that don't work
unless you happen to find the option on the web. For debugging
issues, it might be enough to disable Relaxed Ordering using
setpci, e.g., "setpci -s02:00.0 CAP_EXP+8.w=0"
[1] https://lore.kernel.org/netdev/20200623195229.26411-11-saeedm@xxxxxxxxxxxx/
Hi Bjorn,
Thanks for the detailed reply. From initial testing I can say that
turning off the relaxed ordering on the PCI (setpci -s02:00.0
CAP_EXP+8.w=0) is the chicken bit I was looking for.
This lower the risk of depending on pcie_relaxed_ordering_enabled(). I
will update my patch and resubmit.
Thanks,
Aya