> On Mon, Nov 20, 2023 at 10:13:15AM +0000, Reshetova, Elena wrote: > > Hi Stefan, > > > > Thank you for following up on this! Please find my comments inline. > > > > > -----Original Message----- > > > From: Stefan Hajnoczi <stefanha@xxxxxxxxxx> > > > Sent: Thursday, November 16, 2023 10:03 PM > > > To: Reshetova, Elena <elena.reshetova@xxxxxxxxx> > > > Cc: Michael S. Tsirkin <mst@xxxxxxxxxx>; virtio-dev@xxxxxxxxxxxxxxxxxxxx; > > > virtualization@xxxxxxxxxxxxxxx > > > Subject: Using packed virtqueues in Confidential VMs > > > > > > Hi Elena, > > > You raised concerns about using packed virtqueues with untrusted devices at > > > Linux Plumbers Conference. I reviewed the specification and did not find > > > fundamental issues that would preclude the use of packed virtqueues in > > > untrusted devices. Do you have more information about issues with packed > > > virtqueues? > > > > First of all a bit of clarification: our overall logic for making our first reference > > release of Linux intel tdx stacks [1] was to enable only minimal required > > functionality and this also applied to numerous modes that virtio provided. > > Because for each enabled functionality we would have to do a code audit and > > a proper fuzzing setup and all of this requires resources. > > However, both with packed and split I don't think speculation barriers > have been added and they are likely to be needed. > I wonder whether your fuzzing included attempts to force spectre like > leaks based on speculation execution. Right, the above was only for non-speculative things. For speculation, I have worked a while ago with smatch maintainer to create a new smatch pattern that is able to detect the speculation (spectre v1 style) issues overall in the whole kernel for host <--> guest attack surface (similarly as it does for userspace <--> kernel boundary). It covers virtio also (module my mistakes or limits on function pointer propagation in smatch). Here are the patterns in smatch: https://repo.or.cz/smatch.git/blob/045d29f90c4ab21c374ff587b856f3c30368750f:/smatch_kernel_host_data.c https://repo.or.cz/smatch.git/blob/045d29f90c4ab21c374ff587b856f3c30368750f:/smatch_points_to_host_data.c However, due to lack of resources at that time we have not been able to investigate the reported issues and nothing has been done in the whole kernel about this. If you are interested to do this exercise for virtio, I will be happy to work with you on this. I think we can first verify that it reports everything it needs to cover virtio fully/correctly and then do a review on where it would make sense to insert the speculation barriers. > > > > > The choice of split queue was a natural first step since it is the most > straightforward > > to understand (at least it was for us, bare in mind we are not experts in virtio as > > you are) and the fact that there was work already done ([2] and other patches) > > to harden the descriptors for split queue. > > > > [1] https://github.com/intel/tdx-tools > > [2] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/driver > s/virtio?h=v6.6-rc4&id=72b5e8958738aaa453db5149e6ca3bcf416023b9 > > > > I remember looking at the packed queue long ago and noticing that at least > > some descriptor fields are under device control and I wasn’t sure why the > similar > > hardening was not done here as for the split case. > > packed has R/W descriptors. This means however that data is copied over > from descriptor and validated before use. Makes sense. We didn’t know you have done audit on this, so this is then covered, but ideally it would have to be validated with fuzzing on this mode also. This brings me to another point that I wanted to ask: when you did the hardening audit, what did you consider as points of untrusted inputs? Is it just attributes of virtio-related structures (queues, etc) exposed to host or did you also look at pci config space that virtio uses at attack vector? > > > > However, we had many > > issues to handle in past, and since we didn’t need the packed queue, we > > never went to investigate this further. > > It is also possible that we simply misunderstood the code at that point. > > > > > > > > I also reviewed Linux's virtio_ring.c to look for implementation issues. One > > > thing I noticed was that detach_buf_packed -> vring_unmap_desc_packed > trusts > > > the fields of indirect descriptors that have been mapped to the device: > > > > > > flags = le16_to_cpu(desc->flags); > > > > > > dma_unmap_page(vring_dma_dev(vq), > > > le64_to_cpu(desc->addr), > > > le32_to_cpu(desc->len), > > > (flags & VRING_DESC_F_WRITE) ? > > > DMA_FROM_DEVICE : DMA_TO_DEVICE); > > > > > > > > > > This could be problematic if the device is able to modify indirect descriptors. > > > However, the indirect descriptor table is mapped with DMA_TO_DEVICE: > > > > > > addr = vring_map_single(vq, desc, > > > total_sg * sizeof(struct vring_packed_desc), > > > DMA_TO_DEVICE); > > > > > > There is no problem when there is an enforcing IOMMU that maps the page > with > > > read-only permissions but that's not always the case. > > > > We don’t use IOMMU at the moment for the confidential guest, since we don’t > > have to (memory is encrypted/protected) and only explicitly shared pages are > > available for the host/devices to modify. > > Do I understand it correctly that in our case the indirect descriptor table will > > end up mapped shared for this mode to work and then there is no protection? > > > > I think this whole table is copied to swiotlb (this is what DMA_TO_DEVICE > AFAIK). Yes, in this case we don’t have race/double fetch issues, but the main issue seems to remain. Best Regards, Elena