On Wed, Jan 31, 2018 at 07:13:56PM -0500, Sinan Kaya wrote: > On 1/31/2018 7:01 PM, Myron Stowe wrote: > >> I think from above examples: > >> 1. perf mode is moving devices to 256 MPS as it can. > >> 2. safe mode is setting to 128 MPS > >> 3. perf mode set MRRS=MPS is a CORRECT call for device with MPSC lower than its parents. > >> 4. perf mode set MRRS=MPS is not necessary for a device with SAME MPSC as its parents? > >> 5. it is an interested point to me that slot/switch/root MRRS are always set to 128B, I have not found out why. > > In Sinan's original posting, a reference to > > https://www.xilinx.com/support/documentation/white_papers/wp350.pdf > > was provided. When I read that paper and got to the "Read > > Completion Boundary" section I thought to myself: "If RCB can only > > be 64 or 128 bytes then what's the point of MPS (or MRRS) as all > > TLP completions would be limited to 64 or 128 bytes? (see also the > > paper's 'Read Completions with the RCB Set to 64 Bytes' figure)". > > I brought this up to a colleague and they surmised that possibly > > only _lower end_ (a.k.a. lazy) chipset implementations would truly > > have RCB limited sized completions; higher end chipsets would of > > course have to comply with RCB when communicating with the memory > > controller but could then aggregate data into larger MPS (or MRRS) > > sized TLP completion packets. Perhaps this might explain why you > > always saw slot/switch/root values set at 128B? > > I have looked at that paper before. It is plain wrong. Read > Completion Boundary is about the alignment of addresses that an > endpoint is sending in memory read packets according to the spec. It > has nothing to do with the packet size. I hope this isn't being too pedantic, but I think RCB applies to the *completer*, not the requester. Typically an endpoint generates a memory read and a root port supplies data to complete it. I didn't see anything in the "Read Completion Boundary" section of the paper that I thought *contradicted* the spec. I did think this was speculation: Typically, most root complexes set the RCB at 64 bytes and return data in 64-byte completions instead of what might be allowed by the MPS. That would be legal per spec, but I don't have direct knowledge of whether root complexes actually do that. I tend to doubt it because it seems like people do observe performance differences based on the MPS setting. I agree that the "Read Transaction Throughput" section is inaccurate when it says: The size of the completion is determined by a completer's read completion boundary. RCB may limit the size of the first completion to less than MPS, but subsequent completions (except the final one) only need to be a multiple of RCB. RCB determines the boundaries at which a completer is allowed to end a TLP (PCIe r4.0, sec 2.3.1.1 has the rules). Let's work through an example: MPS=128 RCB=64 Endpoint generates a 256-byte read request for addresses 0x2-0x101 In the absence of RCB, a root port could satisfy this request with two completions: completion 0: bytes 0x2- 0x81 (128 bytes) <-- illegal completion 0: bytes 0x82-0x101 (128 bytes) But with RCB, the first completion is illegal because it doesn't satisfy the entire request and it doesn't end on a multiple of RCB. Because of RCB the completer must respond with at least three completions. The best it can do is this: completion 0: bytes 0x2- 0x7f (126 bytes) completion 1: bytes 0x80- 0xff (128 bytes) completion 2: bytes 0x100-0x101 (2 bytes) It is *allowed* but not required to respond with more completions, e.g., completion 0: bytes 0x2- 0x3f (62 bytes) completion 1: bytes 0x40- 0x7f (64 bytes) completion 2: bytes 0x80- 0xbf (64 bytes) completion 3: bytes 0xc0- 0xff (64 bytes) completion 4: bytes 0x100-0x101 (2 bytes)