Re: One Question About PCIe BUS Config Type with pcie_bus_safe or pcie_bus_perf On NVMe Device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 22, 2018 at 06:24:22PM -0500, Sinan Kaya wrote:
> On 1/22/2018 5:51 PM, Bjorn Helgaas wrote:
> > On Mon, Jan 22, 2018 at 05:04:03PM -0500, Sinan Kaya wrote:
> >> On 1/22/2018 4:36 PM, Bjorn Helgaas wrote:
> 
> >>> Reducing MPS may be necessary if there are several devices in the
> >>> hierarchy and one requires a smaller MPS than the others.  That
> >>> obviously reduces the maximum read and write performance.
> >>>
> >>> Reducing the MRRS may be useful to prevent one device from hogging
> >>> a link, but of course, it reduces read performance for that device
> >>> because we need more read requests.
> >>
> >> Maybe, a picture could help.
> >>
> >>                root (MPS=256)
> >>                  |
> >>          ------------------
> >>         /                  \
> >>    bridge0 (MPS=256)      bridge1 (MPS=128)
> >>       /                       \
> >>     EP0 (MPS=256)            EP1 (MPS=128)
> >>
> >> If I understood this right, code allows the configuration above with
> >> the performance mode so that MPS doesn't have to be uniform across
> >> the tree. 
> > 
> > Yes.  In PERFORMANCE mode, we will set EP1's MRRS=128 and
> > EP0's MRRS=256, just as you show.
> > 
> >> It just needs to be consistent between the root port and endpoints.
> > 
> > No, it doesn't need to be consistent.  In PERFORMANCE mode, we'll set
> > the root's MPS=256 and EP1's MPS=128.
> > 
> > (I'm not actually 100% convinced that the PERFORMANCE mode approach of
> > reducing MRRS is safe, necessary, and maintainable.  I suspect that in
> > many of the interesting cases, the device we care about is the only
> > one below a Root Port, and we can get the performance we need by
> > maximizing MPS and MRRS for that Root Port and its children,
> > independent of the rest of the system.)
> 
> Maybe, I started seeing more and more NVMe devices behind a switch every
> day. That's why, I'm concerned.

Are there a mix of devices that support large MPS and those that only
support a smaller MPS behind the same switch?

I don't have any actual data about topologies of interest.  I'm just
guessing that high-performance devices will often have their own root
port, without being mixed in with low-speed devices.  It's OK to have
switches in the path; it's the mixing high-speed with low-speed
devices that causes the problems.

> >> Why are we reducing MRRS in this case?
> > 
> > We have to set EP1's MRRS=128 so it will never receive a completion
> > larger than 128 bytes.  If we set EP1's MRRS=256, it could receive
> > 256-byte TLPs, which it would treat as malformed.  (We also assume no
> > peer-to-peer DMA that targets EP1.)
> 
> What if we were to keep root port MPS as 128? and not touch the BIOS
> configured MRRS (4k) ?
> 
> Everybody should be happy, right?

No.  In your picture (which helps a lot, thank you!), if you set the
root's MPS=128, EP1 will be fine no matter what its MRRS is because
the entire path has MPS=128.

But if EP0's MRRS is larger than 128, it's in trouble because it may
issue a 256-byte DMA write, which the root port will treat as
malformed.  The root port, as the receiver of that Memory Write
Request, is required to check the TLP against the MPS in its Device
Control (not Device Capability) register (this is in PCIe r4.0, sec
2.2.2).

> I know there is a rule to check the completions against MPS. 

Sec 2.2.2 says a receiver must check "data payloads".  I think that
includes both Completions and Memory Writes.

> Root port could generate transactions that is a multiple of 128
> bytes for reads.

If the root port generates a 256-byte Memory Read request to EP1,
that's fine because EP1 will only respond with 128-byte completions.
If it sends that 256-byte Memory Read to EP0, we have a problem
because EP0 may generate a 256-byte completion, which will cause an
error if the root port has MPS=128.

> Is there any rule against checking incoming writes?

Sec 2.2.2 says a receiver *must* check the payload size of incoming
Memory Write requests.



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux