Re: IOTLB support for vhost/vsock breaks crosvm on Android

Jason Wang <jasowang@xxxxxxxxxx> · Tue, 9 Aug 2022 11:12:39 +0800

On Sun, Aug 7, 2022 at 9:14 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
>
> Will, thanks very much for the analysis and the writeup!
>
> On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote:
> > So how should we fix this? One possibility is for us to hack crosvm to
> > clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost features,
> > but others here have reasonably pointed out that they didn't expect a
> > kernel change to break userspace. On the flip side, the offending commit
> > in the kernel isn't exactly new (it's from the end of 2020!) and so it's
> > likely that others (e.g. QEMU) are using this feature.
>
> Exactly, that's the problem.
>
> vhost is reusing the virtio bits and it's only natural that
> what you are doing would happen.
>
> To be precise, this is what we expected people to do (and what QEMU does):
>
>
> #define QEMU_VHOST_FEATURES ((1 << VIRTIO_F_VERSION_1) |
>                              (1 << VIRTIO_NET_F_RX_MRG) | .... )
>
> VHOST_GET_FEATURES(... &host_features);
> host_features &= QEMU_VHOST_FEATURES
> VHOST_SET_FEATURES(host_features & guest_features)
>
>
> Here QEMU_VHOST_FEATURES are the bits userspace knows about.
>
> Our assumption was that whatever userspace enables, it
> knows what the effect on vhost is going to be.
>
> But yes, I understand absolutely how someone would instead just use the
> guest features. It is unfortunate that we did not catch this in time.
>
>
> In hindsight, we should have just created vhost level macros
> instead of reusing virtio ones. Would address the concern
> about naming: PLATFORM_ACCESS makes sense for the
> guest since there it means "whatever access rules platform has",
> but for vhost a better name would be VHOST_F_IOTLB.

Yes, in the original patch it is called VHOST_F_DEVICE_IOTLB actually
to make it differ from virtio like VHOST_F_LOG_ALL etc. And I remember
I tried to post patch to avoid the bit duplication but the conclusion
is that it's too late for the changes.

> We should have also taken greater pains to document what
> we expect userspace to do. I remember now how I thought about something
> like this but after coding this up in QEMU I forgot to document this :(
> Also, I suspect given the history the GET/SET features ioctl and burned
> wrt extending it and we have to use a new when we add new features.
> All this we can do going forward.
>
>
> But what can we do about the specific issue?
> I am not 100% sure since as Will points out, QEMU and other
> userspace already rely on the current behaviour.
>
> Looking at QEMU specifically, it always sends some translations at
> startup, this in order to handle device rings.
>
> So, *maybe* we can get away with assuming that if no IOTLB ioctl was
> ever invoked then this userspace does not know about IOTLB and
> translation should ignore IOTLB completely.

I think this breaks the security assumptions of some setups.

>
> I am a bit nervous about breaking some *other* userspace which actually
> wants device to be blocked from accessing memory until IOTLB
> has been setup. If we get it wrong we are making guest
> and possibly even host vulnerable.

Yes.

> And of course just revering is not an option either since there
> are now whole stacks depending on the feature.
>
> Will I'd like your input on whether you feel a hack in the kernel
> is justified here.
>
> Also yes, I think it's a good idea to change crosvm anyway.  While the
> work around I describe might make sense upstream I don't think it's a
> reasonable thing to do in stable kernels.

+1

Thanks

> I think I'll prepare a patch documenting the legal vhost features
> as a 1st step even though crosvm is rust so it's not importing
> the header directly, right?
>
> --
> MST
>