Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

Joao Martins <joao.m.martins@xxxxxxxxxx> · Wed, 2 Dec 2020 10:44:03 +0000

[late response - was on holiday yesterday]

On 12/2/20 12:40 AM, Ankur Arora wrote:
> On 2020-12-01 5:07 a.m., David Woodhouse wrote:
>> On Wed, 2019-02-20 at 20:15 +0000, Joao Martins wrote:
>>> +static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
>>> +{
>>> +       struct shared_info *shared_info;
>>> +       struct page *page;
>>> +
>>> +       page = gfn_to_page(kvm, gfn);
>>> +       if (is_error_page(page))
>>> +               return -EINVAL;
>>> +
>>> +       kvm->arch.xen.shinfo_addr = gfn;
>>> +
>>> +       shared_info = page_to_virt(page);
>>> +       memset(shared_info, 0, sizeof(struct shared_info));
>>> +       kvm->arch.xen.shinfo = shared_info;
>>> +       return 0;
>>> +}
>>> +
>>
>> Hm.
>>
>> How come we get to pin the page and directly dereference it every time,
>> while kvm_setup_pvclock_page() has to use kvm_write_guest_cached()
>> instead?
> 
> So looking at my WIP trees from the time, this is something that
> we went back and forth on as well with using just a pinned page or a
> persistent kvm_vcpu_map().
> 
> I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page()
> as shared_info is created early and is not expected to change during the
> lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or
> MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map()
> kvm_vcpu_unmap() dance or do some kind of synchronization.
> 
> That said, I don't think this code explicitly disallows any updates
> to shared_info.
> 
>>
>> If that was allowed, wouldn't it have been a much simpler fix for
>> CVE-2019-3016? What am I missing?
> 
> Agreed.
> 
> Perhaps, Paolo can chime in with why KVM never uses pinned page
> and always prefers to do cached mappings instead?
> 
Part of the CVE fix to not use cached versions.

It's not a longterm pin of the page unlike we try to do here (partly due to the nature
of the pages we are mapping) but we still we map the gpa, RMW the steal time struct, and
then unmap the page.

See record_steal_time() -- but more specifically commit b043138246 ("x86/KVM: Make sure
KVM_VCPU_FLUSH_TLB flag is not missed").

But I am not sure it's a good idea to follow the same as record_steal_time() given that
this is a fairly sensitive code path for event channels.

>>
>> Should I rework these to use kvm_write_guest_cached()?
> 
> kvm_vcpu_map() would be better. The event channel logic does RMW operations
> on shared_info->vcpu_info.
> 
Indeed, yes.

Ankur IIRC, we saw missed event channels notifications when we were using the
{write,read}_cached() version of the patch.

But I can't remember the reason it was due to, either the evtchn_pending or the mask
word -- which would make it not inject an upcall.

	Joao