Re: [RFC] vduse config write support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 24, 2024 at 11:45 AM Srivatsa Vaddagiri
<quic_svaddagi@xxxxxxxxxxx> wrote:
>
> Currently vduse does not seem to support configuration space writes
> (vduse_vdpa_set_config does nothing). Is there any plan to lift that
> limitation? I am aware of the discussions that took place here:
>
> https://patchwork.kernel.org/project/netdevbpf/patch/20210615141331.407-11-xieyongji@xxxxxxxxxxxxx/
>
> Perhaps writes can be supported *selectively* without violating safety concerns
> expressed in the above email discussion?

Adding more relevant people here.

It can probably be done case by case. The main reason for avoiding
config writing is

1) to prevent buggy/malicious userspace from hanging kernel driver for ever
2) to prevent buggy/malicious userspace device to break the semantic

Basically, it is the traditional trust model where the kernel doesn't
trust userspace.

E.g current virtio-blk has the following codes:

tatic ssize_t
cache_type_store(struct device *dev, struct device_attribute *attr,
         const char *buf, size_t count)
{
        struct gendisk *disk = dev_to_disk(dev);
        struct virtio_blk *vblk = disk->private_data;
        struct virtio_device *vdev = vblk->vdev;
        int i;

        BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));
        i = sysfs_match_string(virtblk_cache_types, buf);
        if (i < 0)
        return i;

        virtio_cwrite8(vdev, offsetof(struct virtio_blk_config, wce), i);
        virtblk_update_cache_mode(vdev);
        return count;
}

So basically the question is if we make the config write a posted
write or non-posted one.

If we make vduse config write a posted one, it means vduse doesn't
need to wait for the usersapce response. This is much more safe but it
breaks the above code as it looks like it requires a non-posted
semantic. We need to filter out virtio-blk or change the code to have
a read after the write:

while (virtio_cread8(vdev, offsetof(struct virtio_blk_config, wce) & XXX)
       /* sleep or other */

But we may suffer from the userspace who doesn't update the wce bit
forever, or maybe we can have a timeout there.

If we make vduse config write a non posted one, it means the vduse
needs to wait for the response from the userspace. It satisfies the
above code's assumption but it needs to deal with the buggy userspace
which might be challenging. Technically, we can have a device
emulation in the kernel but it looks like overkill for wce (or I don't
know how it can mandate wce for userspace devices).

I feel it might make sense for other devices that only require posted
write semantics.

>
> We are thinking of using vduse for hypervisor assisted virtio devices, which
> may need config write support and hence this question.
>
> To provide more details, we have a untrusted host that spins off a protected
> (confidential) guest VM on a Type-1 hypervisor (Gunyah). VMM in untrusted host
> leads to couple of issues:
>
> 1) Latency of (virtio) register access. VMM can take too long to respond with
> VCPU stalled all that while. I think vduse shares a similar concern, due to
> which it maintains a cache of configuratin registers inside kernel.

Maybe you can give an example for this? We cache the configuration
space to a faster access to that.

>
> 2) For PCI pass-through devices, we are concerned of letting VMM be in charge of
> emulating the complete configuration space (how can VM defend against invalid
> attributes presented for passthr devices)?

Virtio driver has been hardened for this, for example:

commit 72b5e8958738aaa453db5149e6ca3bcf416023b9
Author: Jason Wang <jasowang@xxxxxxxxxx>
Date:   Fri Jun 4 13:53:50 2021 +0800

    virtio-ring: store DMA metadata in desc_extra for split virtqueue

More hardening work is ongoing.

> I am aware of TDISP, but I think it
> may not be available for some of the devices on our platform.
>
> One option we are considering is for hypervisor to be in charge of virtio-PCI
> bus emulation, allowing only select devices (with recognized features) to be
> registered on the bus. VMM would need to register devices/features with
> hypervisor, which would verify it (as per some policy) and present them to VM on
> the virtio-PCI bus it would emulate. Protected VM should be shielded from
> invalid device configuration information that it may otherwise read from a
> compromised VMM.
>
> For virtio devices, the hypervisor would also service most register read/writes
> (to address concern #1), which implies it would need to cache a copy of the
> device configuration information (similar to vduse).
>
> We think vduse can be leveraged here to initialize the hypervisor cache of
> virtio registers. Basically have a vdpa-gunyah driver registered on the vdpa
> bus to which vduse devices are bound (rather than virtio-vdpa or vhost-vdpa).
> vdpa-gunyah driver can pull configuration information from vduse and pass that
> on to hypervisor. It will also help inject IRQ and pass on queue notifications
> (using hypervisor specific APIs).

Just to make sure I understand the design here, is vdpa-gunyah
expected to have a dedicated uAPI other than vhost-vDPA? Wonder any
reason why vhost-vDPA can be used here.

>
> We will however likely need vduse to support configuration writes (guest VM
> updating configuration space, for ex: writing to 'events_clear' field in case of
> virtio-gpu). Would vduse maintainers be willing to accept config_write support
> for select devices/features (as long as the writes don't violate any safety
> concerns we may have)?

I think so, looking at virtio_gpu_config_changed_work_func(), the
events_clear seems to be fine to have a posted semantic.

Maybe you can post an RFC to support config writing and let's start from there?

Thanks

>
> Thanks
> vatsa
>






[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux