Re: [PATCH v5 RESEND] PCI: Set PCI-E Max Payload Size on fabric

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 25, 2011 at 08:54:17AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2011-06-24 at 17:42 -0500, Jon Mason wrote:
> > On Thu, Jun 23, 2011 at 11:49 PM, Benjamin Herrenschmidt
> > <benh@xxxxxxxxxxxxxxxxxxx> wrote:
> > > On Thu, 2011-06-23 at 22:54 -0500, Jon Mason wrote:
> > >> There is a sizable performance boost for having the largest possible
> > >> maximum payload size on each PCI-E device.  However, the maximum payload
> > >> size must be uniform on a given PCI-E fabric, and each device, bridge,
> > >> and root port can have a different max size.  To find and configure the
> > >> optimal MPS settings, one must walk the fabric and determine the largest
> > >> MPS available on all the devices on the given fabric.
> > >
> > > Ok, so a few comments:
> > >
> > >  - Don't do a pci_walk_bus_self() that takes a "self" bool, that's ugly.
> > > If you want common code, then do a __pci_walk_bus() that takes a "self"
> > > argument (call it include_self ?) and two small wrappers for
> > > pci_walk_bus() and pci_walk_bus_self() (tho arguably the later isn't
> > > needed, just use the __ variant directly)
> > >
> > >  - Thinking more, I don't see the point of that pci_walk_bus_self() ...
> > > It will cause a bridge device to be called twice. If all you want is to
> > > make sure the toplevel one is called, just do that manually before you
> > > call pci_walk_bus_self()
> > 
> > If they are nested, then there could be many layers of PCIE switches
> > that need to be configured with the MPS.  I'll look into this a bit
> > more, but it is necessary.
> 
> No, my point is that without your change it should already be calling
> the callback for intermediary bridges/switches. Or am I missing
> something ? The only "missing" case is the top level from what I can
> tell from reading the code.
> 
> > >  - Does pci_walk_bus() provide any guarantee in term of API as to the
> > > order in which it walks the devices ? IE. Parent first, then children ?
> > 
> > I see no guarantee, but thought it would make you happy to see the
> > patch doing it similar to the way the patch you sent out was doing
> > things.
> >
> > > That's how it's implemented today but are we certain that will remain ?
> > > It's not documented as such.... Your "forcemax" case as it is
> > > implemented will work only as long as this ordering is respected.
> > 
> > That is an excellent point.  Would a comment above the function
> > warning that changing it will break things, or should I create a
> > simpler function, similar to what I had before, that would guarantee
> > it?
> 
> Either way, what does Jesse prefers ?

I added a comment above the function to document it.  So if it is
changed, hopefully all callers would be audited and the comment would
be seen.
 
> > >  - I would like a way to specify the MPS of the host bridge, it may
> > > not be the max of the RC P2P bridge (it -generally is- but I'd like a
> > > way to override that easily).
> > 
> > I could do another boot arg for this.
> 
> Please no... not boot args. Those suck. Besides things can be different
> per host bridge, especially on !x86 platforms. Best is pcibios_* helpers
> (that take the bus as an argument) which the arch can populate for these
> things.

Ok, no boot arg.  It's possible to have any arch code specify the
pcie_mpss prior to the buses being walk, which would work around this
issue.  This might be a bit hacky.  If it ends up being too difficult,
we can do something similar to what you suggest above.


> 
> > >  - I think we need to handle MRRS at the same time. We need MRRS to be
> > > clamped by MPS, don't we ? In addition to being clamped top-down.
> > 
> > You are 100% correct.  I was working on a follow-on patch to do it,
> > but I'll just put it into here.
> 
> Ok.

This is implemented in the patch I just sent out.  Please look it
over.

Thanks,
Jon

> 
> > >  - For MRRS we -must- have the arch provide the max of the host bridge
> > 
> > I'm not sure if this is a must, unless there is some arch requirement.
> 
> Well, RC doesn't expose it's capabilities. So you basically don't know
> what your PHB MRSS limit is unless the arch code tells you. On x86, you
> probably just rely on whatever the BIOS sets up, but a lot of other
> archs out there need to be able to setup the PCIe hierarchy from
> scratch. That's why my patch has this pcie_settings structure that is
> attached to a host bridge which contains the arch provided MPS and MRSS
> for that bridge.
> 
> > IIRC, MRRS is more for bus fairness.  In theory, we would want it to
> > be uniform across all of the system to prevent bus hogging by a device
> > with a larger MRRS.
> 
> It's also about correctness. I've seen cases of devices not working
> properly due to MRSS violations.
> 
> >   Of course, doing so will have a negative impact
> > on performance.  Perhaps we should have 2 settings:
> > 1) performance - set the MPS to the parent MPS and the MRRS to the device MPSS
> > 2) safety - set the MPS uniformly to the smallest on the fabric and
> > the MRRS to the same value
> > Both of these could be overridden by arch specified value.
> > 
> > >  - pcie_mps_forcemax -> pcibios_use_max_mps(bus) or something like that.
> > > I want it to be a function so I can make it per-platform or per host
> > > bridge even if I have to. Might be a good spot to add an argument for
> > > the bridge to return a platform maximum to clamp everything else as
> > > well.
> > >
> > >  - Finally, should we make this "forcemax" the default behaviour or are
> > > we afraid we'll break some nvidia gunk ? (I doubt it will break, that
> > > would imply the two cards come up with small & different MPS in
> > 
> > I'd be fine with that, as long as I can have a warning message telling
> > them to use the more secure way of doing it if they have issues.
> > Sound reasonable?
> > 
> > > practice). Do we want to keep an API for drivers who want to clamp
> > > things between two devices who want to do peer to peer ?
> > 
> > I think that is a larger issue that I was wanting to address in this
> > patch.  I can do this as a follow on patch if there is anyone out
> > there that is interested in having a peer-peer framework.
> > 
> > Thanks so much for reviewing the patch!
> 
> Cheers,
> Ben.
> 
> > Thanks,
> > Jon
> > 
> > > Cheers,
> > > Ben.
> > >
> > >
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux