On 2020-12-02 4:20 a.m., David Woodhouse wrote:
On Wed, 2020-12-02 at 10:44 +0000, Joao Martins wrote:
[late response - was on holiday yesterday]
On 12/2/20 12:40 AM, Ankur Arora wrote:
On 2020-12-01 5:07 a.m., David Woodhouse wrote:
On Wed, 2019-02-20 at 20:15 +0000, Joao Martins wrote:
+static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
+{
+ struct shared_info *shared_info;
+ struct page *page;
+
+ page = gfn_to_page(kvm, gfn);
+ if (is_error_page(page))
+ return -EINVAL;
+
+ kvm->arch.xen.shinfo_addr = gfn;
+
+ shared_info = page_to_virt(page);
+ memset(shared_info, 0, sizeof(struct shared_info));
+ kvm->arch.xen.shinfo = shared_info;
+ return 0;
+}
+
Hm.
How come we get to pin the page and directly dereference it every time,
while kvm_setup_pvclock_page() has to use kvm_write_guest_cached()
instead?
So looking at my WIP trees from the time, this is something that
we went back and forth on as well with using just a pinned page or a
persistent kvm_vcpu_map().
I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page()
as shared_info is created early and is not expected to change during the
lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or
MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map()
kvm_vcpu_unmap() dance or do some kind of synchronization.
That said, I don't think this code explicitly disallows any updates
to shared_info.
If that was allowed, wouldn't it have been a much simpler fix for
CVE-2019-3016? What am I missing?
Agreed.
Perhaps, Paolo can chime in with why KVM never uses pinned page
and always prefers to do cached mappings instead?
Part of the CVE fix to not use cached versions.
It's not a longterm pin of the page unlike we try to do here (partly due to the nature
of the pages we are mapping) but we still we map the gpa, RMW the steal time struct, and
then unmap the page.
See record_steal_time() -- but more specifically commit b043138246 ("x86/KVM: Make sure
KVM_VCPU_FLUSH_TLB flag is not missed").
But I am not sure it's a good idea to follow the same as record_steal_time() given that
this is a fairly sensitive code path for event channels.
Right. We definitely need to use atomic RMW operations (like the CVE
fix did) so the page needs to be *mapped*.
My question was about a permanent pinned mapping vs the map/unmap as we
need it that record_steal_time() does.
On IRC, Paolo told me that permanent pinning causes problems for memory
hotplug, and pointed me at the trick we do with an MMU notifier and
kvm_vcpu_reload_apic_access_page().
Okay that answers my question. Thanks for clearing that up.
Not sure of a good place to document this but it would be good to
have this written down somewhere. Maybe kvm_map_gfn()?
I'm going to stick with the pinning we have for the moment, and just
fix up the fact that it leaks the pinned pages if the guest sets the
shared_info address more than once.
At some point the apic page MMU notifier thing can be made generic, and
we can use that for this and for KVM steal time too.
Yeah, that's something that'll definitely be good to have.
Ankur