Re: [PATCH] PCI: Remove MRRS modification from MPS setting code

Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> · Wed, 07 Sep 2011 06:30:44 -0300

On Wed, 2011-09-07 at 10:13 +0200, Rolf Eike Beer wrote:
> > On Tue, 2011-09-06 at 12:12 -0700, Jon Mason wrote:
> >> > Are these typically hitting with the "performance" option ? IE. It
> >> make
> >> > sense to leave MRRS untouched in the "safe" case.
> >>
> >> The patch I sent out still used the "performance" option without
> >> modifying the default MRRS of the device.  All that have tested it
> >> said that it resolves their issues.
> >
> > But that will cause other issues as I described, if the MRRS end up
> > larger than the MPS. IE. The MRRS of a device must be set to be lower or
> > equal to the MPS of that device (not of the parent btw) if we allow the
> > parent(s) to have a larger MPS.
> 
> I don't think so. Just looking into my lspci I found more than one
> occurence of things like this:
> 
> MaxPayload 128 bytes, MaxReadReq 512 bytes
>
> Which is perfectly fine. The requester (i.e. the device) may issue read
> requests up to 512 bytes at once, which is a thing about transfer credits
> and the like (IIRC). The completer may split the completions into packages
> of any valid size between the minimum size (IIRC 64 bytes) and the maximum
> _payload_ size the requester can handle. In this case you will likely get
> either 8*64 or 4*128 byte completions for a read request of 512 byte, but
> any combination in between would be valid, too.

No. If the MPS of the host bridge is larger than 128, then the host
bridge will possibly return read responses using payloads up to the
MRRS.

> One needs to keep in mind that a read request of 512 byte is itself only a
> 12 or 16 byte packet, so it isn't affected by the payload size at all.

The problem is of course not the request packet itself but the response.

> > I -did- hit a very real problem with adapters where that wasn't true.
> 
> >From my understanding this shows that these adapters were broken, nothing
> else. An adapter must be prepared to get a bunch of smaller packets
> anyway, as it has no control of how the completer sends out the data. So
> maybe your adapters just got screwed by awaiting a 512 byte reply and
> getting 4*128? You could connect them to some sort of bastard completer if
> you have one (I don't) that completes every request in packets of 64 byte
> and see if they will not even explode with 128 byte MRRS.

Unfortunately, I didn't manage to get a good TLP capture of the problem
packets in AER. But basically what happens is:

 - Host bridge has a large MPS (For example 4096)
 - Device has a smaller MPS (for example 128)
 - Device has a large MRRS (for example 512)

What I observed is when receiving network packets larger than roughly
128 bytes (I didn't get precise packet size threshold, I wasn't doing
the tests myself), the device appears to get read -responses- larger
than it's MPS (up to it's MRRS, ie, the size it specified in the read
request), and shoots an UE upstream.

This happens with e1000e's so I doubt it's a broken PCIe implementation
in the device, and it makes sense all things considered.

The Host bridge having an MPS larger than 128, it is allowed to send a
read response using a large TLP, which will be rejected by the device.

The "safe" approach of course is to clamp all MPS to the minimum, but
that leads to way too many situations where everybody gets down to 128
bytes because -one- device in the system has 128 bytes, and that means
that anything that has a hotplug slot must clamp everybody as well.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html