Re: [PATCH] PCI: update device mps when doing pci hotplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 29, 2013 at 9:42 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
> On Mon, Jul 29, 2013 at 9:20 PM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote:
>> On 2013/7/30 7:33, Bjorn Helgaas wrote:
>>> On Mon, May 27, 2013 at 9:15 PM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote:
>>>> Hi Bjorn and Jon,
>>>>    I'm sorry to disturb you. This patch is sent so long, but nobody seems had comment about it.
>>>> Do you have any comment with this patch?
>>>>
>>>> This patch try to update device mps in following case:
>>>> 1) target device under root port
>>>>    Because root port can split TLP, so target device mps greatr than root port mps is ok.
>>>>    But if root port mps greater than target device mps, it's bad, because target device cannot
>>>>    receive TLP payload size greater than its MPS. So if a target device under a root port, I think
>>>>    we should assign its mps greater than or equal root port mps.
>>>> 2) target device under non root port
>>>>    We assume the target device both is a transmitter and receiver, so the safest way is to assign target
>>>>    device mps equal to its parent device.
>>>
>>> Thanks, I just started reviewing this patch, and your notes above are
>>> exactly the question I was going to ask.  The comments in
>>> pcie_bus_update_set() only tell me what the code does.  I can read the
>>> C code just fine; what we need there is the explanation about *why* we
>>> handle devices below root ports differently than others.  Maybe we can
>>> adapt some of your notes as comments in the code.
>>
>> Hi Bjorn,
>>    Thanks for your review and comments!
>>
>>>
>>> Do you have references to the spec where it talks about this
>>> difference?  I want to make sure we can rely on the fact that a root
>>> port can accept TLPs larger than its MPS.
>>
>> PCIe Spec does not explicitly mention this issue, we can only get the message that
>> root port/ root complex can split the TLP into smaller packets. For instance
>> one 256B packet split into two 128B packet.
>>
>> I confirm this issue in my X86 machine and IA64 machine.
>> 1. I unload NIC driver to make sure the safety during  change the NIC MPS.
>> 2. Use setpci change NIC MPS to the max value it supports.
>> 3. Reload the NIC driver
>> 4. Ping and use scp cpoy large file bwtween machines. Result is ok.

Just as a way to confirm that the MPS change is actually doing
something, I assume you observe a performance difference between
MPS=128 and MPS=512 on the NIC (and the root port MPS=128 in both
cases)?  Or maybe you can confirm with an analyzer that there are
actually 512-byte TLPs on the link?

I assume there are no AER or other errors logged by the root port?
The test you showed was a copy *to* the local machine, so the NIC
would have been doing DMA writes to memory.  I assume it works equally
well doing a copy *from* the local machine to another machine across
the network, where the NIC is doing DMA reads from memory?

> The fact that it works on two pieces of hardware is not enough to be
> confident that it will work on all spec-conforming hardware.  Maybe we
> can deduce this from something in the spec, but I'll have to dig into
> it more tomorrow.  I just hoped that you had a spec reference that
> could save me some time.

The only mention I can find in the spec is sec 1.3.1, where it says "a
Root Complex is generally permitted to split a packet into smaller
packets when routing transactions peer-to-peer between hierarchy
domains ..."

I'm not a hardware guy (I often wish I were :)), but here's how I
interpret that statement.  Let's take the following example:

  00:01.0 Root port bridge to [bus 01] MPS=128
  01:00.1 Endpoint MPS=512

  00:02.0 Root Port bridge to [bus 02] MPS=256
  00:03.0 Root Port bridge to [bus 03] MPS=128
  02:00.0 Endpoint MPS=256
  03:00.0 Endpoint MPS=128

If 02:00.0 (MPS=256) generates a DMA write destined for 03:00.0, it
may transmit a TLP with a data payload of 256 bytes, and 00:02.0
(MPS=256 also) will accept it.  The root complex may route the packet
to 00:03.0 (MPS=128), and here it would need to be split into two
128-byte TLPs before being transmitted by 00:03.0 to 03:00.0
(MPS=128).

Your situation is basically 01:00.1 (MPS=512) doing a DMA write
destined for memory and sending a 512-byte TLP to 00:01.0 (MPS=128).
In this case, the root complex isn't doing any peer-to-peer routing
between hierarchy domains, so I don't think the statement in sec 1.3.1
applies.  So I don't understand why the root port would accept that
TLP.  I would think it would report a malformed TLP error.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux