Re: [PATCHv2] pci: properly handle out-of-order SRIOV virtual functions

Laine Stump <laine@xxxxxxxxx> · Fri, 08 Nov 2013 13:13:44 +0200

On 11/07/2013 02:28 PM, Laine Stump wrote:
> On 11/07/2013 02:08 PM, Martin Kletzander wrote:
>> On Thu, Nov 07, 2013 at 01:10:39PM +0200, Laine Stump wrote:
>>> This resolves:
>>>
>>>   https://bugzilla.redhat.com/show_bug.cgi?id=1025397
>>>
>>> When libvirt creates the list of an SRIOV Physical Function's (PF)
>>> Virtual Functions (VF), it assumes that the order of "virtfn*" links
>>> returned by readdir() from the PF's sysfs directory is already in the
>>> correct order. Experience has shown that this is not always the case.
>>>
>>> This results in 1) incorrect assumptions made by consumers of the
>>> output of the virt_functions list of virsh nodedev-dumpxml, and 2)
>>> setting MAC address and vlan tag on the wrong VF (since libvirt uses
>>> netlink to set mac address and vlan tag, netlink requires the VF#, and
>>> the function virPCIGetVirtualFunctionIndex() returns the wrong index
>>> due to the improperly ordered VF list). See the bugzilla report for an
>>> example of this improper behavior.
>>>
>>> The solution provided by this patch is for virPCIGetVirtualFunctions
>>> to first gather all the "virtfn*" names from the PFs sysfs directory,
>>> then allocate a virtual_functions array large enough to hold all
>>> entries, and finally to create a device entry for each virtfn* name
>>> and place it into the virtual_functions array at the proper index
>>> (based on interpreting the value following "virtfn" in the name).
>>>
>>> Checks are introduced to ensure that 1) the virtfn list given in sysfs
>>> is not sparse (ie, there are no indexes larger than the length of the
>>> list), and that no two virtfn* entries decode to the same index.
>>> ---
>>>
>>> Difference from V1: Based on jtomko's review, truncate trailing NULLs
>>> in virtual_functions array, and actually check for NULLs in the middle
>>> (a sparse array) and generate an error in that case, as we already
>>> claimed to do in the commit message.
>>>
>>> (Sorry for the lack of inter-diff, but the only changes from V1 are
>>> the addition of the lines starting with "truncate any trailing nulls"
>>> and continuing until the next VIR_DEBUG.)
>>>
>>>  src/util/virpci.c | 115 ++++++++++++++++++++++++++++++++++++++++++------------
>>>  1 file changed, 90 insertions(+), 25 deletions(-)
>>>
>> Disclaimer: I don't want this to sound rude (just that it does to me
>> every time I read it), this is just a question that popped in my mind.
> Not at all. Alternatives are always welcome.
>
>> Why don't we do it differently?  If we have a while loop with a number
>> starting from 1 and access(, F_OK)'ing '"virtfn%d", number' in that
>> loop, we'll have all those VFs sorted.  And if that's not nice enough,
>> we should be able to read the number of VFs from the device's
>> "/config" (in offset 0x10 [1]) and do a nice, clean for loop over
>> already known filenames.
>>
>> Or am I missing something?  I just Took that from "drivers/pci/iov.c",
>> so this may be "Just Wrong".
> No, that would work too. I was just looking at it from the inside rather
> than the outside, so was only thinking of how to fix the current code
> rather than how to rewrite it, and never considered that.
>
> My one concern would be if someone actually pushed a patch into the
> kernel similar to the one suggested by the original reporter of this
> problem (which put leading 0's on all the numbers in the virtfn
> filenames), but that seems unlikely, as it isn't scalable (what happens
> when you get > 100 VFs?).
>
> I like your idea enough to redo the patch using it. It may be tomorrow
> before I get it posted though.

I've posted a patch based on this suggestion (2 patches actually, due to
a prerequisite that changes the type of num_virtual_functions from
unsigned int to size_t), and I think the results are very good - rather
than adding new code, about 30 lines were removed. The result is simpler
even than the original, and ends up with correct results.

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list