Re: [PATCH net-next 00/19] Mellanox, mlx5 sub function support

Jiri Pirko <jiri@xxxxxxxxxxx> · Fri, 8 Nov 2019 22:39:52 +0100

Fri, Nov 08, 2019 at 10:21:20PM CET, jakub.kicinski@xxxxxxxxxxxxx wrote:
>On Fri, 8 Nov 2019 20:41:18 +0100, Jiri Pirko wrote:
>> Fri, Nov 08, 2019 at 08:06:40PM CET, jakub.kicinski@xxxxxxxxxxxxx wrote:
>> >On Fri, 8 Nov 2019 13:12:33 +0100, Jiri Pirko wrote:  
>> >> Thu, Nov 07, 2019 at 09:32:34PM CET, jakub.kicinski@xxxxxxxxxxxxx wrote:  
>> >> >On Thu,  7 Nov 2019 10:04:48 -0600, Parav Pandit wrote:    
>> >> >> Mellanox sub function capability allows users to create several hundreds
>> >> >> of networking and/or rdma devices without depending on PCI SR-IOV support.    
>> >> >
>> >> >You call the new port type "sub function" but the devlink port flavour
>> >> >is mdev.
>> >> >
>> >> >As I'm sure you remember you nacked my patches exposing NFP's PCI 
>> >> >sub functions which are just regions of the BAR without any mdev
>> >> >capability. Am I in the clear to repost those now? Jiri?    
>> >> 
>> >> Well question is, if it makes sense to have SFs without having them as
>> >> mdev? I mean, we discussed the modelling thoroughtly and eventually we
>> >> realized that in order to model this correctly, we need SFs on "a bus".
>> >> Originally we were thinking about custom bus, but mdev is already there
>> >> to handle this.  
>> >
>> >But the "main/real" port is not a mdev in your case. NFP is like mlx4. 
>> >It has one PCI PF for multiple ports.  
>> 
>> I don't see how relevant the number of PFs-vs-uplink_ports is.
>
>Well. We have a slice per external port, the association between the
>port and the slice becomes irrelevant once switchdev mode is enabled,
>but the queues are assigned statically so it'd be a waste of resources
>to not show all slices as netdevs.
>
>> >> Our SFs are also just regions of the BAR, same thing as you have.
>> >> 
>> >> Can't you do the same for nfp SFs?
>> >> Then the "mdev" flavour is enough for all.  
>> >
>> >Absolutely not. 
>> >
>> >Why not make the main device of mlx5 a mdev, too, if that's acceptable.
>> >There's (a) long precedence for multiple ports on one PCI PF in
>> >networking devices, (b) plenty deployed software 
>> >which depend on the main devices hanging off the PCI PF directly.
>> >
>> >The point of mdevs is being able to sign them to VFs or run DPDK on
>> >them (map to user space).
>> >
>> >For normal devices existing sysfs hierarchy were one device has
>> >multiple children of a certain class, without a bus and a separate
>> >driver is perfectly fine. Do you think we should also slice all serial
>> >chips into mdevs if they have multiple lines.
>> >
>> >Exactly as I predicted much confusion about what's being achieved here,
>> >heh :)  
>> 
>> Please let me understand how your device is different.
>> Originally Parav didn't want to have mlx5 subfunctions as mdev. He
>> wanted to have them tight to the same pci device as the pf. No
>> difference from what you describe you want. However while we thought
>> about how to fit things in, how to handle na phys_port_name, how to see
>> things in sysfs we came up with an idea of a dedicated bus.
>
>The difference is that there is naturally a main device and subslices
>with this new mlx5 code. In mlx4 or nfp all ports are equal and
>statically allocated when FW initializes based on port breakout.

Ah, I see. I was missing the static part in nfp. Now I understand. It is
just an another "pf", but not real pf in the pci terminology, right?

>
>Maybe it's the fact I spent last night at an airport but I'm feeling
>like I'm arguing about this stronger than I actually care :)
>
>> We took it upstream and people suggested to use mdev bus for this.
>> 
>> Parav, please correct me if I'm wrong but I don't think where is a plan
>> to push SFs into VM or to userspace as Jakub expects, right?
>
>There's definitely a plan to push them to VFs, I believe that was part
>of the original requirements, otherwise there'd be absolutely no need
>for a bus to begin with.