Re: One Question About PCIe BUS Config Type with pcie_bus_safe or pcie_bus_perf On NVMe Device

Bjorn Helgaas <helgaas@xxxxxxxxxx> · Mon, 22 Jan 2018 15:36:30 -0600

On Sat, Jan 20, 2018 at 02:20:06PM -0500, Sinan Kaya wrote:
> On 1/19/2018 3:51 PM, Bjorn Helgaas wrote:
> > Consider a switch leading to an endpoint that supports only MPS=128.
> > The simplest approach would be to configure every device in the fabric
> > with MPS=128.  That guarantees the endpoint will never receive a TLP
> > with a payload larger than 128 bytes.
> > 
> > Here's my understanding of how PCIE_BUS_PERFORMANCE works:
> > 
> > There are two ways an endpoint may receive TLPs with data payloads:
> > (1) Memory Write Requests that target the endpoint, and (2)
> > Completions with Data in response to Memory Read Requests generated by
> > the endpoint.
> > 
> > In PCIE_BUS_PERFORMANCE mode, we assume Memory Write Requests are not
> > an issue because:
> > 
> >   - We assume a CPU Memory Write Request is never larger than MPS (128
> >     bytes in this case).  This is fairly safe because CPUs generally
> >     can't write more than a cache line in one request, and most CPUs
> >     have cache lines of 128 bytes or less.
> 
> Fair assumption.
> 
> >   - We assume there's no peer-to-peer DMA, so other devices in the
> >     fabric will never send Memory Write Requests to the endpoint, so
> >     we don't need to limit their MPS settings.
> > 
> > That leaves Completions.  We limit the size of Completions by limiting
> > MRRS.  If we set the endpoint's MRRS to its MPS (128 in this case), it
> > will never request more than MPS bytes at a time, so it will never
> > receive a Completion with more than MPS bytes.
> > 
> > Therefore, we may be able to configure other devices in the fabric
> > with MPS larger than 128, which may benefit those devices.
> 
> This is still problematic. One application may be doing a lot of
> writes compared to reads. We prefer maximizing endpoint write
> performance compared to read performance by reducing the MRRS
> setting.

Help me understand exactly what is problematic.  No matter what your
read/write mix is, a single device in isolation should get the best
performance with both MPS and MRRS at the highest possible settings.

Reducing MPS may be necessary if there are several devices in the
hierarchy and one requires a smaller MPS than the others.  That
obviously reduces the maximum read and write performance.

Reducing the MRRS may be useful to prevent one device from hogging a
link, but of course, it reduces read performance for that device
because we need more read requests.