Re: [PATCH v5 RESEND] PCI: Set PCI-E Max Payload Size on fabric

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 23, 2011 at 11:49 PM, Benjamin Herrenschmidt
<benh@xxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, 2011-06-23 at 22:54 -0500, Jon Mason wrote:
>> There is a sizable performance boost for having the largest possible
>> maximum payload size on each PCI-E device.  However, the maximum payload
>> size must be uniform on a given PCI-E fabric, and each device, bridge,
>> and root port can have a different max size.  To find and configure the
>> optimal MPS settings, one must walk the fabric and determine the largest
>> MPS available on all the devices on the given fabric.
>
> Ok, so a few comments:
>
>  - Don't do a pci_walk_bus_self() that takes a "self" bool, that's ugly.
> If you want common code, then do a __pci_walk_bus() that takes a "self"
> argument (call it include_self ?) and two small wrappers for
> pci_walk_bus() and pci_walk_bus_self() (tho arguably the later isn't
> needed, just use the __ variant directly)
>
>  - Thinking more, I don't see the point of that pci_walk_bus_self() ...
> It will cause a bridge device to be called twice. If all you want is to
> make sure the toplevel one is called, just do that manually before you
> call pci_walk_bus_self()

If they are nested, then there could be many layers of PCIE switches
that need to be configured with the MPS.  I'll look into this a bit
more, but it is necessary.

>  - Does pci_walk_bus() provide any guarantee in term of API as to the
> order in which it walks the devices ? IE. Parent first, then children ?

I see no guarantee, but thought it would make you happy to see the
patch doing it similar to the way the patch you sent out was doing
things.

> That's how it's implemented today but are we certain that will remain ?
> It's not documented as such.... Your "forcemax" case as it is
> implemented will work only as long as this ordering is respected.

That is an excellent point.  Would a comment above the function
warning that changing it will break things, or should I create a
simpler function, similar to what I had before, that would guarantee
it?

>  - I would like a way to specify the MPS of the host bridge, it may
> not be the max of the RC P2P bridge (it -generally is- but I'd like a
> way to override that easily).

I could do another boot arg for this.

>  - I think we need to handle MRRS at the same time. We need MRRS to be
> clamped by MPS, don't we ? In addition to being clamped top-down.

You are 100% correct.  I was working on a follow-on patch to do it,
but I'll just put it into here.

>  - For MRRS we -must- have the arch provide the max of the host bridge

I'm not sure if this is a must, unless there is some arch requirement.

IIRC, MRRS is more for bus fairness.  In theory, we would want it to
be uniform across all of the system to prevent bus hogging by a device
with a larger MRRS.  Of course, doing so will have a negative impact
on performance.  Perhaps we should have 2 settings:
1) performance - set the MPS to the parent MPS and the MRRS to the device MPSS
2) safety - set the MPS uniformly to the smallest on the fabric and
the MRRS to the same value
Both of these could be overridden by arch specified value.

>  - pcie_mps_forcemax -> pcibios_use_max_mps(bus) or something like that.
> I want it to be a function so I can make it per-platform or per host
> bridge even if I have to. Might be a good spot to add an argument for
> the bridge to return a platform maximum to clamp everything else as
> well.
>
>  - Finally, should we make this "forcemax" the default behaviour or are
> we afraid we'll break some nvidia gunk ? (I doubt it will break, that
> would imply the two cards come up with small & different MPS in

I'd be fine with that, as long as I can have a warning message telling
them to use the more secure way of doing it if they have issues.
Sound reasonable?

> practice). Do we want to keep an API for drivers who want to clamp
> things between two devices who want to do peer to peer ?

I think that is a larger issue that I was wanting to address in this
patch.  I can do this as a follow on patch if there is anyone out
there that is interested in having a peer-peer framework.

Thanks so much for reviewing the patch!

Thanks,
Jon

> Cheers,
> Ben.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux