Re: IOTLB support for vhost/vsock breaks crosvm on Android

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Will, thanks very much for the analysis and the writeup!

On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote:
> So how should we fix this? One possibility is for us to hack crosvm to
> clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost features,
> but others here have reasonably pointed out that they didn't expect a
> kernel change to break userspace. On the flip side, the offending commit
> in the kernel isn't exactly new (it's from the end of 2020!) and so it's
> likely that others (e.g. QEMU) are using this feature.

Exactly, that's the problem.

vhost is reusing the virtio bits and it's only natural that
what you are doing would happen.

To be precise, this is what we expected people to do (and what QEMU does):


#define QEMU_VHOST_FEATURES ((1 << VIRTIO_F_VERSION_1) |
			     (1 << VIRTIO_NET_F_RX_MRG) | .... )

VHOST_GET_FEATURES(... &host_features);
host_features &= QEMU_VHOST_FEATURES
VHOST_SET_FEATURES(host_features & guest_features)


Here QEMU_VHOST_FEATURES are the bits userspace knows about.

Our assumption was that whatever userspace enables, it
knows what the effect on vhost is going to be.

But yes, I understand absolutely how someone would instead just use the
guest features. It is unfortunate that we did not catch this in time.


In hindsight, we should have just created vhost level macros
instead of reusing virtio ones. Would address the concern
about naming: PLATFORM_ACCESS makes sense for the
guest since there it means "whatever access rules platform has",
but for vhost a better name would be VHOST_F_IOTLB.
We should have also taken greater pains to document what
we expect userspace to do. I remember now how I thought about something
like this but after coding this up in QEMU I forgot to document this :(
Also, I suspect given the history the GET/SET features ioctl and burned
wrt extending it and we have to use a new when we add new features.
All this we can do going forward.


But what can we do about the specific issue?
I am not 100% sure since as Will points out, QEMU and other
userspace already rely on the current behaviour.

Looking at QEMU specifically, it always sends some translations at
startup, this in order to handle device rings.

So, *maybe* we can get away with assuming that if no IOTLB ioctl was
ever invoked then this userspace does not know about IOTLB and
translation should ignore IOTLB completely.

I am a bit nervous about breaking some *other* userspace which actually
wants device to be blocked from accessing memory until IOTLB
has been setup. If we get it wrong we are making guest
and possibly even host vulnerable.
And of course just revering is not an option either since there
are now whole stacks depending on the feature.

Will I'd like your input on whether you feel a hack in the kernel
is justified here.

Also yes, I think it's a good idea to change crosvm anyway.  While the
work around I describe might make sense upstream I don't think it's a
reasonable thing to do in stable kernels.
I think I'll prepare a patch documenting the legal vhost features
as a 1st step even though crosvm is rust so it's not importing
the header directly, right?

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization



[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux