On Fri, 13 Oct 2017 07:01:48 -0500 Steven Royer <seroyer@xxxxxxxxxxxxxxxxxx> wrote: > On 2017-10-13 06:53, Steven Royer wrote: > > On 2017-10-12 22:34, Bjorn Helgaas wrote: > >> [+cc Alex, Bodong, Eli, Saeed] > >> > >> On Thu, Oct 12, 2017 at 02:59:23PM -0500, Bryant G. Ly wrote: > >>> On 10/12/17 1:29 PM, Bjorn Helgaas wrote: > >>> >On Thu, Oct 12, 2017 at 03:09:53PM +1100, Michael Ellerman wrote: > >>> >>Bjorn Helgaas <helgaas@xxxxxxxxxx> writes: > >>> >> > >>> >>>On Fri, Sep 22, 2017 at 09:19:28AM -0500, Bryant G. Ly wrote: > >>> >>>>This patch adds the machine dependent call for > >>> >>>>pcibios_bus_add_device, since the previous patch > >>> >>>>separated the calls out between the PowerNV and PowerVM. > >>> >>>> > >>> >>>>The difference here is that for the PowerVM environment > >>> >>>>we do not want match_driver set because in this environment > >>> >>>>we do not want the VF device drivers to load immediately, due to > >>> >>>>firmware loading the device node when VF device is assigned to the > >>> >>>>logical partition. > >>> >>>> > >>> >>>>This patch will depend on the patch linked below, which is under > >>> >>>>review. > >>> >>>> > >>> >>>>https://patchwork.kernel.org/patch/9882915/ > >>> >>>> > >>> >>>>Signed-off-by: Bryant G. Ly <bryantly@xxxxxxxxxxxxxxxxxx> > >>> >>>>Signed-off-by: Juan J. Alvarez <jjalvare@xxxxxxxxxx> > >>> >>>>--- > >>> >>>> arch/powerpc/platforms/pseries/eeh_pseries.c | 24 ++++++++++++++++++++++++ > >>> >>>> 1 file changed, 24 insertions(+) > >>> >>>> > >>> >>>>diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c > >>> >>>>index 6b812ad990e4..45946ee90985 100644 > >>> >>>>--- a/arch/powerpc/platforms/pseries/eeh_pseries.c > >>> >>>>+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c > >>> >>>>@@ -64,6 +64,27 @@ static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX]; > >>> >>>> static DEFINE_SPINLOCK(slot_errbuf_lock); > >>> >>>> static int eeh_error_buf_size; > >>> >>>>+void pseries_pcibios_bus_add_device(struct pci_dev *pdev) > >>> >>>>+{ > >>> >>>>+ struct pci_dn *pdn = pci_get_pdn(pdev); > >>> >>>>+ > >>> >>>>+ if (!pdev->is_virtfn) > >>> >>>>+ return; > >>> >>>>+ > >>> >>>>+ pdn->device_id = pdev->device; > >>> >>>>+ pdn->vendor_id = pdev->vendor; > >>> >>>>+ pdn->class_code = pdev->class; > >>> >>>>+ > >>> >>>>+ /* > >>> >>>>+ * The following operations will fail if VF's sysfs files > >>> >>>>+ * aren't created or its resources aren't finalized. > >>> >>>>+ */ > >>> >>>>+ eeh_add_device_early(pdn); > >>> >>>>+ eeh_add_device_late(pdev); > >>> >>>>+ eeh_sysfs_add_device(pdev); > >>> >>>>+ pdev->match_driver = -1; > >>> >>>match_driver is a bool, which should be assigned "true" or "false". > >>> >>Above he mentioned a dependency on: > >>> >> > >>> >> [04/10] PCI: extend pci device match_driver state > >>> >> https://patchwork.kernel.org/patch/9882915/ > >>> >> > >>> >> > >>> >>Which makes it an int. > >>> >Oh, right, I missed that, thanks. > >>> > > >>> >>Or has that patch been rejected or something? > >>> >I haven't *rejected* it, but it's low on my priority list, so you > >>> >shouldn't depend on it unless it adds functionality you really need. > >>> >If I did apply that particular patch, I would want some rework because > >>> >it currently obfuscates the match_driver logic. There's no clue when > >>> >reading the code what -1/0/1 mean. > >>> So do you prefer enum's? - If so I can make a change for that. > >>> >Apparently here you *do* want the "-1 means the PCI core will never > >>> >set match_driver to 1" functionality, so maybe you do depend on it. > >>> We depend on the patch because we want that ability to never set > >>> match_driver, > >>> for SRIOV on PowerVM. > >> > >> Is this really new PowerVM-specific functionality? ISTR recent > >> discussions > >> about inhibiting driver binding in a generic way, e.g., > >> http://lkml.kernel.org/r/1490022874-54718-1-git-send-email-bodong@xxxxxxxxxxxx > >> > >>> >If that's the case, how to you ever bind a driver to these VFs? The > >>> >changelog says you don't want VF drivers to load *immediately*, so I > >>> >assume you do want them to load eventually. > >>> > > >>> The VF's that get dynamically created within the configure SR-IOV > >>> call, on the Pseries Platform, wont be matched with a driver. - We > >>> do not want it to match. > >>> > >>> The Power Hypervisor will load the VFs. The VF's will get > >>> assigned(by the user) via the HMC or Novalink in this environment > >>> which will then trigger PHYP to load the VF device node to the > >>> device tree. > >> > >> I don't know what it means for the Hypervisor to "load the VFs." Can > >> you explain that in PCI-speak? > >> > >> The things I know about are: > >> > >> - we set PCI_SRIOV_CTRL_VFE in the PF, which enables VFs > >> - now the VFs respond to config accesses > >> - the PCI core enumerates the VFs by reading their config space > >> - the PCI core builds pci_dev structs for the VFs > >> - the PCI core adds these pci_devs to the bus > >> - we try to bind drivers to the VFs > >> - the VF driver probe function may read VF config space and VF BARs > >> - the VF may be assigned to a guest VM > >> > >> Where does "loading the VFs" fit in? I don't know what HMC, Novalink, > >> or PHYP are. I don't *need* to know what they are, as long as you can > >> explain what's happening in terms of the PCI concepts and generic > >> Linux VMs > >> and device assignment. > >> > >> Bjorn > > > > The VFs will be hotplugged into the VM separately from the enable > > SR-IOV, so the driver will load as part of the hotplug operation. > > > > Steve > > One more point of clarification: when the hotplug happens, the VF will > show up on a virtual PCI bus that is not directly correlated to the real > PCI bus that the PF is on. On that virtual PCI bus, the driver will > match because it won't be set to -1. I'm pretty lost too, but I think what's being said is that the paravirtualized SR-IOV enable creates VFs according to the SR-IOV offset and stride capabilities of the PF, but we're supposed to ignore those (why are we even creating pci_devs for them?) and the hypervisor will actually hotplug the VFs somewhere else. How's that still SR-IOV? Why wouldn't the hypervisor just add the real VFs that we're supposed to use at the offset and stride indicated by the PF SR-IOV capability and mask the VFs that we're not supposed to see? Thanks, Alex