On Sun, Aug 7, 2022 at 9:14 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > > Will, thanks very much for the analysis and the writeup! > > On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: > > So how should we fix this? One possibility is for us to hack crosvm to > > clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost features, > > but others here have reasonably pointed out that they didn't expect a > > kernel change to break userspace. On the flip side, the offending commit > > in the kernel isn't exactly new (it's from the end of 2020!) and so it's > > likely that others (e.g. QEMU) are using this feature. > > Exactly, that's the problem. > > vhost is reusing the virtio bits and it's only natural that > what you are doing would happen. > > To be precise, this is what we expected people to do (and what QEMU does): > > > #define QEMU_VHOST_FEATURES ((1 << VIRTIO_F_VERSION_1) | > (1 << VIRTIO_NET_F_RX_MRG) | .... ) > > VHOST_GET_FEATURES(... &host_features); > host_features &= QEMU_VHOST_FEATURES > VHOST_SET_FEATURES(host_features & guest_features) > > > Here QEMU_VHOST_FEATURES are the bits userspace knows about. > > Our assumption was that whatever userspace enables, it > knows what the effect on vhost is going to be. > > But yes, I understand absolutely how someone would instead just use the > guest features. It is unfortunate that we did not catch this in time. > > > In hindsight, we should have just created vhost level macros > instead of reusing virtio ones. Would address the concern > about naming: PLATFORM_ACCESS makes sense for the > guest since there it means "whatever access rules platform has", > but for vhost a better name would be VHOST_F_IOTLB. Yes, in the original patch it is called VHOST_F_DEVICE_IOTLB actually to make it differ from virtio like VHOST_F_LOG_ALL etc. And I remember I tried to post patch to avoid the bit duplication but the conclusion is that it's too late for the changes. > We should have also taken greater pains to document what > we expect userspace to do. I remember now how I thought about something > like this but after coding this up in QEMU I forgot to document this :( > Also, I suspect given the history the GET/SET features ioctl and burned > wrt extending it and we have to use a new when we add new features. > All this we can do going forward. > > > But what can we do about the specific issue? > I am not 100% sure since as Will points out, QEMU and other > userspace already rely on the current behaviour. > > Looking at QEMU specifically, it always sends some translations at > startup, this in order to handle device rings. > > So, *maybe* we can get away with assuming that if no IOTLB ioctl was > ever invoked then this userspace does not know about IOTLB and > translation should ignore IOTLB completely. I think this breaks the security assumptions of some setups. > > I am a bit nervous about breaking some *other* userspace which actually > wants device to be blocked from accessing memory until IOTLB > has been setup. If we get it wrong we are making guest > and possibly even host vulnerable. Yes. > And of course just revering is not an option either since there > are now whole stacks depending on the feature. > > Will I'd like your input on whether you feel a hack in the kernel > is justified here. > > Also yes, I think it's a good idea to change crosvm anyway. While the > work around I describe might make sense upstream I don't think it's a > reasonable thing to do in stable kernels. +1 Thanks > I think I'll prepare a patch documenting the legal vhost features > as a 1st step even though crosvm is rust so it's not importing > the header directly, right? > > -- > MST >