On 3/2/21 5:25 AM, Jason Wang wrote: > > On 2021/3/1 8:06 下午, Sean Mooney wrote: >> On Mon, 2021-03-01 at 11:28 +0100, Adrian Moreno wrote: >>> On 3/1/21 8:50 AM, Jason Wang wrote: >>>> On 2021/3/1 3:29 下午, Parav Pandit wrote: >>>>>> From: Jason Wang <jasowang@xxxxxxxxxx> >>>>>> Sent: Monday, March 1, 2021 9:29 AM >>>>>> >>>>>> On 2021/2/26 4:50 下午, Parav Pandit wrote: >>>>>>>> From: Jason Wang <jasowang@xxxxxxxxxx> >>>>>>>> Sent: Friday, February 26, 2021 1:56 PM >>>>>>>> >>>>>>>> >>>>>>>> On 2021/2/26 1:32 下午, Parav Pandit wrote: >>>>>>>>>> From: Jason Wang <jasowang@xxxxxxxxxx> >>>>>>>>>> Sent: Wednesday, February 24, 2021 2:18 PM >>>>>>>>>> >>>>>>>>>> On 2021/2/24 2:18 下午, Parav Pandit wrote: >>>>>>>>>>> + >>>>>>>>>>> + if (nla_put_u16(msg, >>>>>> VDPA_ATTR_DEV_NET_CFG_MAX_VQP, >>>>>>>>>>> + config->max_virtqueue_pairs)) >>>>>>>>>>> + return -EMSGSIZE; >>>>>>>>>> We probably need to check VIRTIO_NET_F_RSS here. >>>>>>>>> Yes. Will use it in v2. >>>>>>>>> >>>>>>>>>>> + if (nla_put_u8(msg, >>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_MAX_KEY_LEN, >>>>>>>>>>> + config->rss_max_key_size)) >>>>>>>>>>> + return -EMSGSIZE; >>>>>>>>>>> + >>>>>>>>>>> + rss_field = le16_to_cpu(config->rss_max_key_size); >>>>>>>>>>> + if (nla_put_u16(msg, >>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_MAX_IT_LEN, >>>>>>>>>> rss_field)) >>>>>>>>>>> + return -EMSGSIZE; >>>>>>>>>>> + >>>>>>>>>>> + hash_types = le32_to_cpu(config->supported_hash_types); >>>>>>>>>>> + if (nla_put_u32(msg, >>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_HASH_TYPES, >>>>>>>>>>> + config->supported_hash_types)) >>>>>>>>>>> + return -EMSGSIZE; >>>>>>>>>>> + return 0; >>>>>>>>>>> +} >>>>>>>>>>> + >>>>>>>>>>> +static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, >>>>>>>>>>> +struct sk_buff *msg) { >>>>>>>>>>> + struct virtio_net_config config = {}; >>>>>>>>>>> + >>>>>>>>>>> + vdev->config->get_config(vdev, 0, &config, sizeof(config)); >>>>>>>>>> Do we need to sync with other possible get_config calls here? >>>>>>>>> To readers of get_config() is ok. No need to sync. >>>>>>>>> However, I think set_config() and get_config() should sync otherwise >>>>>>>> get_config can get partial value. >>>>>>>>> Will probably have to add vdpa_device->config_access_lock(). >>>>>>>> Probably. And the set_config() should be synchronized with the >>>>>>>> requrest that comes from vDPA bus. >>>>>>> Yes, a rw semaphore is good enough. >>>>>>> Device config fields can be change from the device side anyway, such as >>>>>> link status anyway. >>>>>>> Sync using lock shouldn’t be problem. >>>>>>> >>>>>>>> This makes me think whether we should go back to two phases method, >>>>>>>> create and enable. >>>>>>>> >>>>>>>> The vDPA device is only registred after enabling, then there's no >>>>>>>> need to sync with vDPA bus, and mgmt is not allowed to change config >>>>>> after enalbing? >>>>>>> In that case we should be able to disable it as well. Disable should take >>>>>>> the >>>>>> device of the bus. >>>>>>> I find it weird to have this model to linger around the device without >>>>>> anchoring on the bus. >>>>>>> For example device is not yet enabled, so it is not anchored on the bus, but >>>>>> its name is still have to remain unique. >>>>>> >>>>>> >>>>>> Can we do some synchornization between vdpa device allocation and vdpa >>>>>> device registier to solve the naming issue? >>>>> mgmt tool querying the device config space after vdpa device is in use is >>>>> real. >>>>> So I don't see solving it any differently. >>>>> >>>>> Now upper layers of vdpa to not access the device on the placed on the vdpa >>>>> bus, can be taken care by existing driver code using >>>>> echo 0 > /sys/bus/vdpa/drivers_autoprobe >>>>> >>>>> By default vdpa device placed on the bus wont bind to any driver (net/vhost >>>>> etc). >>>>> 1. Decision to bind to which driver is done after config of the vdpa device is >>>>> done. >>>>> 2. orchestration sw does create and config before it binds to the right driver >>>>> 3. which driver to bind to decided based on the use case of that individual >>>>> vdpa device >>>>> For example, >>>>> (a) vdpa0 binds to net driver to have Netdev for container >>>>> (b) vdpa1 binds to vhost driver to map vdpa device to VM >>>> >>>> I think it should work. >>>>> Adding Adrian, Maxime and Sean for more comments. >>> Is this a strong requirement or just a optional operation? > > > So my understanding is that something needs to be done to make sure: > > 1) the vpda device is not bound to virtio-vdpaThis is a problem of api not being unified, right? If vdpa device is bound to virtio-vdpa I would expect some net-specific options (like mac or tso) would be modifiable through net_device_ops if the needed capabilities are held. > 2) config is not changed if it has been set by management >For virtio-vdpa (virtio-net) devices, that would be controlled via CAP_NET_ADMIN. For vhost-vdpa case, are you thinking of a way of stopping a user from sending VHOST_VDPA_SET_CONFIG messages? or making it fail if management has made some config at the vdpa level? > >>> This might break some existing logic that expects netdev or vhost-vdpa device to >>> exist before the some config (e.g:mac address) is set. >> the orcestration software should not require permission to do dirver binding >> and in the case of openstack will >> not have the capablity to bind the vdpa device to a dirvier in the current >> proposed design. >> the ordering we are expecting to support in openstack at least initally is >> that the adminsitrator >> will use either systemd service files or udev rules to allocate the VF and >> vdpa device staictly at boot. >> >> in the case of openstack we will expect all vdpa devies that will be used by >> openstack(nova) for assignment >> to vms will be prebound by the adminstrators to vhost-vdpa. openstack will >> simply track the /dev/vhost* paths >> corallated to the vdpa devices whose parent VF are allowed by the pci >> passthough whitelist. >> >> if the adminstrator binds the vdpa device to virtio-vdpa so that >> virtio-net-pci is used on the host nova and libvirt >> will not correct that and the vm boot will fail. but provided you do not list >> that parent VF in the whitelist nova >> will also ignore it. so if you want to mix and match the vdpa device driver >> that should not break anything provided >> you do not list its parent in the pci whitelist. >> >> but one thing to keep in mind is whlie we are not planning to create VF/VDPA >> devices dynamiclaly at lesat not for the initall >> implentation we do expact to be able to configure attibute of the vdpa device >> after tehy are create like any other nic. >> for example the mac, mtu, link-state should all be configureable at runtime >> via normal ip route2 commands. > > > So in the case of vDPA, this task is expected to be done by vdpa tool. (There's > no guarantee that vDPA is a VF). > > >> >> at the morment we are encounterng issue with the mac adress specificly which >> qemu is hopefully going to work around since >> libvirt and nova cannot currently configure the mac address of a vhost-vdpa >> device so we are relying on qemu >> to take the mac adress form the qemu command line so that it does not end up >> as all 0 macs in the guest. > > > Note that a kernel patch is posted by Eli[1] which assign a randomized mac. This > should solve the issue in another way. > > >> as some one that works on orcestration level i would have expect that setting >> the mac/mtu on either the vf or the netdev >> representor to match the deireed mac/mtu in the guest would have been >> sufficent to alter the vdpa device confirutation. >> likely the netdev representor would make the most sence to alter to match the >> desired mac so that the mac adress see on the ovs >> bridge matches the mac of the vm. > > > Speaking for genreal vDPA, it's not guaranteed that the vendor will implmenet > the device with a representor. So vDPA tool is expected to be the only way to > configure the device. > > Thanks > > [1] https://www.spinics.net/lists/linux-virtualization/msg48451.html > > >> the current case where the vdpa representor netdev seams to have no bareing on >> the vdpa device >> config makes debugging much more difficalt and is likely to confuse monitoring >> software. fortunetly openstack is not matching on the mac >> address in this spefic case to identify the netdev represntor port but we have >> other backend that do look at the mac. in the case of ovs >> we store the uuid of the neutron port in the ovsdb when adding a port to ovs >> so we rely on that not the mac but for legacy mode >> sriov which we hope to support with vdpa eventulally after we comple the >> hardware offloaed ovs support, currenlty relys on the mac in some cases. >> >> i hope that answered your questions? >> we are aware that we likely will have to support vdpa device creation at some >> point if we intend to add support for vdpa devices >> created form subfunctions or intels siov howver that was not ready when we >> started this work so it was declared out of scope and >> the curernt static approch was taken as the only path we could enable before >> code free in 10 days. >> >> regards sean. >> >>> >>> Thanks >>> >> >> https://www.spinics.net/lists/linux-virtualization/msg48451.html > -- Adrián Moreno _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization