On 1/31/2018 10:37 PM, Bjorn Helgaas wrote: > On Wed, Jan 31, 2018 at 07:13:56PM -0500, Sinan Kaya wrote: >> On 1/31/2018 7:01 PM, Myron Stowe wrote: >>>> I think from above examples: >>>> 1. perf mode is moving devices to 256 MPS as it can. >>>> 2. safe mode is setting to 128 MPS >>>> 3. perf mode set MRRS=MPS is a CORRECT call for device with MPSC lower than its parents. >>>> 4. perf mode set MRRS=MPS is not necessary for a device with SAME MPSC as its parents? >>>> 5. it is an interested point to me that slot/switch/root MRRS are always set to 128B, I have not found out why. > >>> In Sinan's original posting, a reference to >>> https://www.xilinx.com/support/documentation/white_papers/wp350.pdf >>> was provided. When I read that paper and got to the "Read >>> Completion Boundary" section I thought to myself: "If RCB can only >>> be 64 or 128 bytes then what's the point of MPS (or MRRS) as all >>> TLP completions would be limited to 64 or 128 bytes? (see also the >>> paper's 'Read Completions with the RCB Set to 64 Bytes' figure)". >>> I brought this up to a colleague and they surmised that possibly >>> only _lower end_ (a.k.a. lazy) chipset implementations would truly >>> have RCB limited sized completions; higher end chipsets would of >>> course have to comply with RCB when communicating with the memory >>> controller but could then aggregate data into larger MPS (or MRRS) >>> sized TLP completion packets. Perhaps this might explain why you >>> always saw slot/switch/root values set at 128B? >> >> I have looked at that paper before. It is plain wrong. Read >> Completion Boundary is about the alignment of addresses that an >> endpoint is sending in memory read packets according to the spec. It >> has nothing to do with the packet size. > > I hope this isn't being too pedantic, but I think RCB applies to the > *completer*, not the requester. Typically an endpoint generates a > memory read and a root port supplies data to complete it. > > I didn't see anything in the "Read Completion Boundary" section of the > paper that I thought *contradicted* the spec. I did think this was > speculation: > > Typically, most root complexes set the RCB at 64 bytes and return > data in 64-byte completions instead of what might be allowed by the > MPS. > > That would be legal per spec, but I don't have direct knowledge of > whether root complexes actually do that. I tend to doubt it because > it seems like people do observe performance differences based on the > MPS setting. > > I agree that the "Read Transaction Throughput" section is inaccurate > when it says: > > The size of the completion is determined by a completer's read > completion boundary. > > RCB may limit the size of the first completion to less than MPS, but > subsequent completions (except the final one) only need to be a > multiple of RCB. > > RCB determines the boundaries at which a completer is allowed to end a > TLP (PCIe r4.0, sec 2.3.1.1 has the rules). Let's work through an > example: > > MPS=128 > RCB=64 > > Endpoint generates a 256-byte read request for addresses 0x2-0x101 > > In the absence of RCB, a root port could satisfy this request with two > completions: > > completion 0: bytes 0x2- 0x81 (128 bytes) <-- illegal > completion 0: bytes 0x82-0x101 (128 bytes) > > But with RCB, the first completion is illegal because it doesn't > satisfy the entire request and it doesn't end on a multiple of RCB. > Because of RCB the completer must respond with at least three > completions. The best it can do is this: > > completion 0: bytes 0x2- 0x7f (126 bytes) > completion 1: bytes 0x80- 0xff (128 bytes) > completion 2: bytes 0x100-0x101 (2 bytes) > > It is *allowed* but not required to respond with more completions, > e.g., > > completion 0: bytes 0x2- 0x3f (62 bytes) > completion 1: bytes 0x40- 0x7f (64 bytes) > completion 2: bytes 0x80- 0xbf (64 bytes) > completion 3: bytes 0xc0- 0xff (64 bytes) > completion 4: bytes 0x100-0x101 (2 bytes) > Very good summary. Thanks for doing a detailed analysis. -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.