Re: [net-next v4 00/15] Add mlx5 subfunction support

Alexander Duyck <alexander.duyck@xxxxxxxxx> · Tue, 15 Dec 2020 18:19:18 -0800

On Tue, Dec 15, 2020 at 4:20 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>
> On Tue, Dec 15, 2020 at 01:41:04PM -0800, Alexander Duyck wrote:
>
> > > not just devlink and switchdev, auxbus was also introduced to
> > > standardize some of the interfaces.
> >
> > The auxbus is just there to make up for the fact that there isn't
> > another bus type for this though. I would imagine otherwise this would
> > be on some sort of platform bus.
>
> Please lets not start this again. This was gone over with Greg for
> literally a year and a half and he explicitly NAK'd platform bus for
> this purpose.
>
> Aux bus exists to connect different kernel subsystems that touch the
> same HW block together. Here we have the mlx5_core subsystem, vdpa,
> rdma, and netdev all being linked together using auxbus.
>
> It is kind of like what MFD does, but again, using MFD for this was
> also NAK'd by Greg.

Sorry I wasn't saying we needed to use MFD or platform bus. I'm aware
of the discussions I was just saying there are some parallels, not
that we needed to go through and use that.

> At the very worst we might sometime find out there is some common
> stuff between ADIs that we might get an ADI bus, but I'm not
> optimistic. So far it looks like there is no commonality.
>
> Aux bus has at least 4 users already in various stages of submission,
> and many other target areas that should be replaced by it.

Aux bus is fine and I am happy with it. I was just saying that auxbus
isn't something that should really be used to say that this is
significantly different from mdev as they both rely on a bus topology.

> > I would really like to see is a solid standardization of what this is.
> > Otherwise the comparison is going to be made. Especially since a year
> > ago Mellanox was pushing this as an mdev type interface.
>
> mdev was NAK'd too.
>
> mdev is only for creating /dev/vfio/*.

Agreed. However my worry is that as we start looking to make this
support virtualization it will still end up swinging more toward mdev.
I would much prefer to make certain that any mdev is a flavor and
doesn't end up coloring the entire interface.

> > That is all well and good. However if we agree that SR-IOV wasn't done
> > right saying that you are spinning up something that works just like
> > SR-IOV isn't all that appealing, is it?
>
> Fitting into some universal least-common-denominator was never a goal
> for SR-IOV, so I wouldn't agree it was done wrong.

It isn't so much about right or wrong but he use cases. My experience
has been that SR-IOV ends up being used for very niche use cases where
you are direct assigning it into either DPDK or some NFV VM and you
are essentially building the application around the NIC. It is all
well and good, but for general virtualization it never really caught
on.

My thought is that if we are going to do this we need to do this in a
way that improves the usability, otherwise we are looking at more
niche use cases.

> > I am talking about my perspective. From what I have seen, one-off
> > features that are only available from specific vendors are a pain to
> > deal with and difficult to enable when you have to support multiple
> > vendors within your ecosystem. What you end up going for is usually
> > the lowest common denominator because you ideally want to be able to
> > configure all your devices the same and have one recipe for setup.
>
> So encourage other vendors to support the switchdev model for managing
> VFs and ADIs!

Ugh, don't get me started on switchdev. The biggest issue as I see it
with switchev is that you have to have a true switch in order to
really be able to use it. As such dumbed down hardware like the ixgbe
for instance cannot use it since it defaults to outputting anything
that doesn't have an existing rule to the external port. If we could
tweak the design to allow for more dumbed down hardware it would
probably be much easier to get wider adoption.

Honestly, the switchdev interface isn't what I was talking about. I
was talking about the SF interface, not the switchdev side of it. In
my mind you can place your complexity into the switchdev side of the
interface, but keep the SF interface simple. Then you can back it with
whatever you want, but without having to have a vendor specific
version of the interface being plugged into the guest or container.

One of the reasons why virtio-net is being pushed as a common
interface for vendors is for this reason. It is an interface that can
be emulated by software or hardware and it allows the guest to run on
any arbitrary hardware.

> > I'm not saying you cannot enable those features. However at the same
> > time I am saying it would be nice to have a vendor neutral way of
> > dealing with those if we are going to support SF, ideally with some
> > sort of software fallback that may not perform as well but will at
> > least get us the same functionality.
>
> Is it really true there is no way to create a software device on a
> switchdev today? I looked for a while and couldn't find
> anything. openvswitch can do this, so it does seem like a gap, but
> this has nothing to do with this series.

It has plenty to do with this series. This topic has been under
discussion since something like 2017 when Mellanox first brought it up
at Netdev 2.1. At the time I told them they should implement this as a
veth offload. Then it becomes obvious what the fallback becomes as you
can place packets into one end of a veth and it comes out the other,
just like a switchdev representor and the SF in this case. It would
make much more sense to do it this way rather than setting up yet
another vendor proprietary interface pair.

> A software switchdev path should still end up with the representor and
> user facing netdev, and the behavior of the two netdevs should be
> identical to the VF switchdev flow we already have today.
>
> SF doesn't change any of this, it just shines a light that, yes,
> people actually have been using VFs with netdevs in containers and
> switchdev, as part of their operations.
>
> FWIW, I view this as a positive because it shows the switchdev model
> is working very well and seeing adoption beyond the original idea of
> controlling VMs with SRIOV.

PF/VF isolation is a given. So the existing switchdev behavior is fine
in that regard. I wouldn't expect that to change. The fact is we
actually support something similar in the macvlan approach we put in
the Intel drivers since the macvlan itself provided an option for
isolation or to hairpin the traffic if you wanted to allow the VMDq
instances to be bypassed. That was another thing I view as a huge
feature as broadcast/multicast traffic can really get ugly when the
devices are all separate pipelines and being able to switch that off
and just do software replication and hairpinning can be extremely
useful.

> > I'm trying to remember which netdev conference it was. I referred to
> > this as a veth switchdev offload when something like this was first
> > brought up.
>
> Sure, though I think the way you'd create such a thing might be
> different. These APIs are really about creating an ADI that might be
> assigned to a VM and never have a netdev.
>
> It would be nonsense to create a veth-switchdev thing with out a
> netdev, and there have been various past attempts already NAK'd to
> transform a netdev into an ADI.
>
> Anyhow, if such a thing exists someday it could make sense to
> automatically substitute the HW version using a SF, if available.

The main problem as I see it is the fact that the SF interface is
bound too tightly to the hardware. The direct pass-thru with no
hairpin is always an option but if we are going to have an interface
where both ends are in the same system there are many cases where
always pushing all the packets off to the hardware don't necessarily
make sense.

> > could address those needs would be a good way to go for this as it
> > would force everyone to come together and define a standardized
> > feature set that all of the vendors would want to expose.
>
> I would say switchdev is already the standard feature set.

Yes, it is a standard feature set for the control plane. However for
the data-path it is somewhat limited as I feel it only describes what
goes through the switch. Not the interfaces that are exposed as the
endpoints. It is the problem of that last bit and how it is handled
that can make things ugly. For example the multicast/broadcast
replication problem that just occurred to me while writing this up.
The fact is for east-west traffic there has always been a problem with
the switchdev model as it limits everything to PCIe/DMA so there are
cases where software switches can outperform the hardware ones.