Re: [RFC PATCH] pci: add hook for architectures to disble SR-IOV at runtime

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2011-10-07 at 06:29 -0400, Prarit Bhargava wrote:

> Frankly this is only useful for developers and _a few_ enterprise
> users at this point.  Maybe the value needs to be a config value, but
> that doesn't explain the increase in complexity of the code Nish is
> suggesting.

No, it's all our enterprise users since we are shipping SR-IOV capable
adapters and Linux trying to mess around with VFs makes them not work.

> What we're saying is that the real fix isn't to plunk a bunch of code
> which will be dead in a few months when powerpc gets its act together
> on SRIOV.  Going with a bunch of complex arch-specific code for little
> benefit seems overkill here.

Well, powerpc itself has no problem with SR-IOV, ie, it will work on all
our platforms that don't have a hypervisor :-)

The case of pHyp is special since all the resources are under control of
the hypervisor and I don't know if will -ever- get its act together.
AFAIK, there is no plan at this point for pHyp to support guest OSes in
control of PFs creating VFs at this stage but I don't know for sure.

So we are in the realm of something that the platform code knows at boot
time. Asking the -user- to know about these things (or even care wtf
SR-IOV means or is about) is -NOT- the right approach here. Things
should "just work" to the extent that these adapters should operate in
PF only ode just fine.

Now regarding the actual patch itself, I like neither of the proposed
ones so far. The sriov=off one looks like something that could have some
values for field diagnostics etc... and so might be good to apply
regardless. For the pHyp issues the problem goes deeper, which is why
just disabling sr-iov automatically on those platforms is the easiest
solution for now.

Long run, I'd rather see a bunch of the stuff that sriov_init does at
probe time to move later, after resources have been allocated, to allow
to -not- change the VF count or mess with the page size if the IOV
resource has not been allocated and will not be.

Also, the blind whacking of the page size is something that I suspect
have had very very little testing.

The system page size setting in SR-IOV is a strange and not well
architected beast. It's -not- just the stride between the VFs.

It also tends to impact the layout of various data structures (DMA
descriptors etc...) manipulated by the chip (depending on the vendor),
and I wouldn't be surprised that changing it just plain breaks out
driver (we still don't know exactly what's going on just yet but we know
that going through sriov_init, even without requesting VFs later on,
already breaks some adapters for us, possibly because of that for
example). 

Cheers,
Ben.

> [FWIW -- Bjorn's suggestion of pushing all of this into the sriov
> struct is the way to go]
> 
> P.
> 
> > Ben.
> > 
> >>
> >>>> -----8<------
> >>>>
> >>>> [PATCH] Add a kernel parameter that disables SRIOV
> >>>>
> >>>> Add sriov=[0|1] as a kernel parameter to disable or enable SRIOV.
> This is
> >>>> useful for debugging SRIOV issues vs. bare-metal issues.
> >>>>
> >>>> Signed-off-by: Prarit Bhargava<prarit@xxxxxxxxxx>
> >>>> ---
> >>>>   Documentation/kernel-parameters.txt |    4 ++++
> >>>>   drivers/pci/iov.c                   |   30
> +++++++++++++++++++++++++++++-
> >>>>   2 files changed, 33 insertions(+), 1 deletions(-)
> >>>>
> >>>> diff --git a/Documentation/kernel-parameters.txt
> b/Documentation/kernel-parameters.txt
> >>>> index 854ed5c..235b5c5 100644
> >>>> --- a/Documentation/kernel-parameters.txt
> >>>> +++ b/Documentation/kernel-parameters.txt
> >>>> @@ -2385,6 +2385,10 @@ bytes respectively. Such letter suffixes
> can also be entirely omitted.
> >>>>   	spia_pedr=
> >>>>   	spia_peddr=
> >>>>
> >>>> +	sriov=		[HW]
> >>>> +			Format: [0|1]
> >>>> +			Enable or disable SRIOV support in kernel.
> >>>> +
> >>>>   	stacktrace	[FTRACE]
> >>>>   			Enabled the stack tracer on boot up.
> >>>>
> >>>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> >>>> index 42fae47..da96e3f 100644
> >>>> --- a/drivers/pci/iov.c
> >>>> +++ b/drivers/pci/iov.c
> >>>> @@ -408,6 +408,19 @@ static void sriov_disable(struct pci_dev
> *dev)
> >>>>   	iov->nr_virtfn = 0;
> >>>>   }
> >>>>
> >>>> +static unsigned int sriov_enabled = 1;
> >>>> +static int __init do_sriov_enable(char *str)
> >>>> +{
> >>>> +	get_option(&str,&sriov_enabled);
> >>>> +	if (!sriov_enabled)
> >>>> +		printk(KERN_INFO "SRIOV: Disabled\n");
> >>>> +	else
> >>>> +		printk(KERN_INFO "SRIOV: Enabled\n");
> >>>> +
> >>>> +	return 1;
> >>>> +}
> >>>> +__setup("sriov=", do_sriov_enable);
> >>>
> >>> Hrm, we only will print the above is sriov= is passed on the
> kernel
> >>> command-line, right? It seems like maybe we should print it
> always. And
> >>> perhaps the prefix should be "SR-IOV"?
> >>>
> >>>>   static int sriov_init(struct pci_dev *dev, int pos)
> >>>>   {
> >>>>   	int i;
> >>>> @@ -419,6 +432,12 @@ static int sriov_init(struct pci_dev *dev,
> int pos)
> >>>>   	struct resource *res;
> >>>>   	struct pci_dev *pdev;
> >>>>
> >>>> +	if (!sriov_enabled) {
> >>>> +		rc = -EPERM;
> >>>> +		/* in theory no flags should have been set ... */
> >>>> +		goto failed;
> >>>> +	}
> >>>> +
> >>>
> >>> Hrm, is this needed if pci_enable_sriov() checks it already? Isn't
> that
> >>> the only caller?
> >>>
> >>>>   	if (dev->pcie_type != PCI_EXP_TYPE_RC_END&&
> >>>>   	dev->pcie_type != PCI_EXP_TYPE_ENDPOINT)
> >>>>   		return -ENODEV;
> >>>> @@ -614,7 +633,10 @@ resource_size_t
> pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
> >>>>   	struct resource tmp;
> >>>>   	enum pci_bar_type type;
> >>>>   	int reg = pci_iov_resource_bar(dev, resno,&type);
> >>>> -	
> >>>> +
> >>>> +	if (!sriov_enabled)
> >>>> +		return 0;
> >>>> +
> >>>>   	if (!reg)
> >>>>   		return 0;
> >>>>
> >>>> @@ -667,6 +689,9 @@ int pci_enable_sriov(struct pci_dev *dev, int
> nr_virtfn)
> >>>>   {
> >>>>   	might_sleep();
> >>>>
> >>>> +	if (!sriov_enabled)
> >>>> +		return -EPERM;
> >>>> +
> >>>>   	if (!dev->is_physfn)
> >>>>   		return -ENODEV;
> >>>>
> >>>> @@ -682,6 +707,9 @@ void pci_disable_sriov(struct pci_dev *dev)
> >>>>   {
> >>>>   	might_sleep();
> >>>>
> >>>> +	if (!sriov_enabled)
> >>>> +		return;
> >>>> +
> >>>>   	if (!dev->is_physfn)
> >>>>   		return;
> >>>
> >>> Ah my patch needs these three checks as well, FWIW.
> >>>
> >>> Thanks,
> >>> Nish
> >>>
> > 
> > 


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux