Re: [PATCH linux-next 2/9] vdpa: Introduce query of device config layout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 3/2/21 5:25 AM, Jason Wang wrote:
> 
> On 2021/3/1 8:06 下午, Sean Mooney wrote:
>> On Mon, 2021-03-01 at 11:28 +0100, Adrian Moreno wrote:
>>> On 3/1/21 8:50 AM, Jason Wang wrote:
>>>> On 2021/3/1 3:29 下午, Parav Pandit wrote:
>>>>>> From: Jason Wang <jasowang@xxxxxxxxxx>
>>>>>> Sent: Monday, March 1, 2021 9:29 AM
>>>>>>
>>>>>> On 2021/2/26 4:50 下午, Parav Pandit wrote:
>>>>>>>> From: Jason Wang <jasowang@xxxxxxxxxx>
>>>>>>>> Sent: Friday, February 26, 2021 1:56 PM
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2021/2/26 1:32 下午, Parav Pandit wrote:
>>>>>>>>>> From: Jason Wang <jasowang@xxxxxxxxxx>
>>>>>>>>>> Sent: Wednesday, February 24, 2021 2:18 PM
>>>>>>>>>>
>>>>>>>>>> On 2021/2/24 2:18 下午, Parav Pandit wrote:
>>>>>>>>>>> +
>>>>>>>>>>> +    if (nla_put_u16(msg,
>>>>>> VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>>>>>>>>>>> +            config->max_virtqueue_pairs))
>>>>>>>>>>> +        return -EMSGSIZE;
>>>>>>>>>> We probably need to check VIRTIO_NET_F_RSS here.
>>>>>>>>> Yes. Will use it in v2.
>>>>>>>>>
>>>>>>>>>>> +    if (nla_put_u8(msg,
>>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_MAX_KEY_LEN,
>>>>>>>>>>> +               config->rss_max_key_size))
>>>>>>>>>>> +        return -EMSGSIZE;
>>>>>>>>>>> +
>>>>>>>>>>> +    rss_field = le16_to_cpu(config->rss_max_key_size);
>>>>>>>>>>> +    if (nla_put_u16(msg,
>>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_MAX_IT_LEN,
>>>>>>>>>> rss_field))
>>>>>>>>>>> +        return -EMSGSIZE;
>>>>>>>>>>> +
>>>>>>>>>>> +    hash_types = le32_to_cpu(config->supported_hash_types);
>>>>>>>>>>> +    if (nla_put_u32(msg,
>>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_HASH_TYPES,
>>>>>>>>>>> +            config->supported_hash_types))
>>>>>>>>>>> +        return -EMSGSIZE;
>>>>>>>>>>> +    return 0;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static int vdpa_dev_net_config_fill(struct vdpa_device *vdev,
>>>>>>>>>>> +struct sk_buff *msg) {
>>>>>>>>>>> +    struct virtio_net_config config = {};
>>>>>>>>>>> +
>>>>>>>>>>> +    vdev->config->get_config(vdev, 0, &config, sizeof(config));
>>>>>>>>>> Do we need to sync with other possible get_config calls here?
>>>>>>>>> To readers of get_config() is ok. No need to sync.
>>>>>>>>> However, I think set_config() and get_config() should sync otherwise
>>>>>>>> get_config can get partial value.
>>>>>>>>> Will probably have to add vdpa_device->config_access_lock().
>>>>>>>> Probably. And the set_config() should be synchronized with the
>>>>>>>> requrest that comes from vDPA bus.
>>>>>>> Yes, a rw semaphore is good enough.
>>>>>>> Device config fields can be change from the device side anyway, such as
>>>>>> link status anyway.
>>>>>>> Sync using lock shouldn’t be problem.
>>>>>>>
>>>>>>>> This makes me think whether we should go back to two phases method,
>>>>>>>> create and enable.
>>>>>>>>
>>>>>>>> The vDPA device is only registred after enabling, then there's no
>>>>>>>> need to sync with vDPA bus, and mgmt is not allowed to change config
>>>>>> after enalbing?
>>>>>>> In that case we should be able to disable it as well. Disable should take
>>>>>>> the
>>>>>> device of the bus.
>>>>>>> I find it weird to have this model to linger around the device without
>>>>>> anchoring on the bus.
>>>>>>> For example device is not yet enabled, so it is not anchored on the bus, but
>>>>>> its name is still have to remain unique.
>>>>>>
>>>>>>
>>>>>> Can we do some synchornization between vdpa device allocation and vdpa
>>>>>> device registier to solve the naming issue?
>>>>> mgmt tool querying the device config space after vdpa device is in use is
>>>>> real.
>>>>> So I don't see solving it any differently.
>>>>>
>>>>> Now upper layers of vdpa to not access the device on the placed on the vdpa
>>>>> bus, can be taken care by existing driver code using
>>>>> echo 0 > /sys/bus/vdpa/drivers_autoprobe
>>>>>
>>>>> By default vdpa device placed on the bus wont bind to any driver (net/vhost
>>>>> etc).
>>>>> 1. Decision to bind to which driver is done after config of the vdpa device is
>>>>> done.
>>>>> 2. orchestration sw does create and config before it binds to the right driver
>>>>> 3. which driver to bind to decided based on the use case of that individual
>>>>> vdpa device
>>>>> For example,
>>>>> (a) vdpa0 binds to net driver to have Netdev for container
>>>>> (b) vdpa1 binds to vhost driver to map vdpa device to VM
>>>>
>>>> I think it should work.
>>>>> Adding Adrian, Maxime and Sean for more comments.
>>> Is this a strong requirement or just a optional operation?
> 
> 
> So my understanding is that something needs to be done to make sure:
> 

> 1) the vpda device is not bound to virtio-vdpaThis is a problem of api not being unified, right? If vdpa device is bound to
virtio-vdpa I would expect some net-specific options (like mac or tso) would be
modifiable through net_device_ops if the needed capabilities are held.

> 2) config is not changed if it has been set by management
>For virtio-vdpa (virtio-net) devices, that would be controlled via CAP_NET_ADMIN.
For vhost-vdpa case, are you thinking of a way of stopping a user from sending
VHOST_VDPA_SET_CONFIG messages? or making it fail if management has made some
config at the vdpa level?

> 
>>> This might break some existing logic that expects netdev or vhost-vdpa device to
>>> exist before the some config (e.g:mac address) is set.
>> the orcestration software should not require permission to do dirver binding
>> and in the case of openstack will
>> not have the capablity to bind the vdpa device to a dirvier in the current
>> proposed design.
>> the ordering we are expecting to support in openstack at least initally is
>> that the adminsitrator
>> will use either systemd service files or udev rules to allocate the VF and
>> vdpa device staictly at boot.
>>
>> in the case of openstack we will expect all vdpa devies that will be used by
>> openstack(nova) for assignment
>> to vms will be prebound by the adminstrators to vhost-vdpa. openstack will
>> simply track the /dev/vhost* paths
>> corallated to the vdpa devices whose parent VF are allowed by the pci
>> passthough whitelist.
>>
>> if the adminstrator binds the vdpa device to virtio-vdpa so that
>> virtio-net-pci is used on the host nova and libvirt
>> will not correct that and the vm boot will fail. but provided you do not list
>> that parent VF in the whitelist nova
>> will also ignore it. so if you want to mix and match the vdpa device driver
>> that should not break anything provided
>> you do not list its parent in the pci whitelist.
>>
>> but one thing to keep in mind is whlie we are not planning to create VF/VDPA
>> devices dynamiclaly at lesat not for the initall
>> implentation we do expact to be able to configure attibute of the vdpa device
>> after tehy are create like any other nic.
>> for example the mac, mtu, link-state should all be configureable at runtime
>> via normal ip route2 commands.
> 
> 
> So in the case of vDPA, this task is expected to be done by vdpa tool. (There's
> no guarantee that vDPA is a VF).
> 
> 
>>
>> at the morment we are encounterng issue with the mac adress specificly which
>> qemu is hopefully going to work around since
>> libvirt and nova cannot currently configure the mac address of a vhost-vdpa
>> device so we are relying on qemu
>> to take the mac adress form the qemu command line so that it does not end up
>> as all 0 macs in the guest.
> 
> 
> Note that a kernel patch is posted by Eli[1] which assign a randomized mac. This
> should solve the issue in another way.
> 
> 
>> as some one that works on orcestration level i would have expect that setting
>> the mac/mtu on either the vf or the netdev
>> representor to match the deireed mac/mtu in the guest would have been
>> sufficent to alter the vdpa device confirutation.
>> likely  the netdev representor would make the most sence to alter to match the
>> desired mac so that the mac adress see on the ovs
>> bridge matches the mac of the vm.
> 
> 
> Speaking for genreal vDPA, it's not guaranteed that the vendor will implmenet
> the device with a representor. So vDPA tool is expected to be the only way to
> configure the device.
> 
> Thanks
> 
> [1] https://www.spinics.net/lists/linux-virtualization/msg48451.html
> 
> 
>> the current case where the vdpa representor netdev seams to have no bareing on
>> the vdpa device
>> config makes debugging much more difficalt and is likely to confuse monitoring
>> software. fortunetly openstack is not matching on the mac
>> address in this spefic case to identify the netdev represntor port but we have
>> other backend that do look at the mac. in the case of ovs
>> we store the uuid of the neutron port in the ovsdb when adding a port to ovs
>> so we rely on that not the mac but  for legacy mode
>> sriov which we hope to support with vdpa eventulally after we comple the
>> hardware offloaed ovs support, currenlty relys on the mac in some cases.
>>
>> i hope that answered your questions?
>> we are aware that we likely will have to support vdpa device creation at some
>> point if we intend to add support for vdpa devices
>> created form subfunctions or intels siov howver that was not ready when we
>> started this work so it was declared out of scope and
>> the curernt static approch was taken as the only path we could enable before
>> code free in 10 days.
>>
>> regards sean.
>>
>>>
>>> Thanks
>>>
>>
>> https://www.spinics.net/lists/linux-virtualization/msg48451.html
> 

-- 
Adrián Moreno

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux