Hi Sean,
On 10/29/22 12:51 AM, Sean Christopherson wrote:
On Fri, Oct 28, 2022, Gavin Shan wrote:
On 10/28/22 2:30 AM, Marc Zyngier wrote:
On Thu, 27 Oct 2022 18:44:51 +0100,
Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
On Thu, Oct 27, 2022, Marc Zyngier wrote:
On Tue, 25 Oct 2022 18:47:12 +0100, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
[...]
And ideally such bugs would detected without relying on userspace to
enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
time and was only found when mark_page_dirty_in_slot() started
WARNing.
I'm ok if arm64 wants to let userspace shoot itself in the foot with
the ITS, but I'm not ok dropping the protections in the common
mark_page_dirty_in_slot().
One somewhat gross idea would be to let architectures override the
"there must be a running vCPU" rule, e.g. arm64 could toggle a flag
in kvm->arch in its kvm_write_guest_lock() to note that an expected
write without a vCPU is in-progress:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8c5c69ba47a7..d1da8914f749 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
#ifdef CONFIG_HAVE_KVM_DIRTY_RING
- if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
+ if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
+ return;
+
+ if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
return;
#endif
@@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
unsigned long rel_gfn = gfn - memslot->base_gfn;
u32 slot = (memslot->as_id << 16) | memslot->id;
- if (kvm->dirty_ring_size)
+ if (kvm->dirty_ring_size && vcpu)
kvm_dirty_ring_push(&vcpu->dirty_ring,
slot, rel_gfn);
- else
+ else if (memslot->dirty_bitmap)
set_bit_le(rel_gfn, memslot->dirty_bitmap);
}
}
...
A slightly different alternative would be have a completely separate
API for writing guest memory without an associated vCPU. I.e. start
building up proper device emulation support. Then the vCPU-based
APIs could yell if a vCPU isn't provided (or there is no running
vCPU in the current mess). And the deviced-based API could be
provided if and only if the architecture actually supports emulating
writes from devices, i.e. x86 would not opt-in and so would even
have access to the API.
Which is what I was putting under the "major surgery" label in my
previous email.
Anyhow, for the purpose of unblocking Gavin's series, I suggest to
adopt your per-arch opt-out suggestion as a stop gap measure, and we
will then be able to bike-shed for weeks on what the shape of the
device-originated memory dirtying API should be.
It's really a 'major surgery' and I would like to make sure I fully understand
'a completely separate API for writing guest memory without an associated vCPU",
before I'm going to working on v7 for this.
There are 7 functions and 2 macros involved as below. I assume Sean is suggesting
to add another argument, whose name can be 'has_vcpu', for these functions and macros?
No.
As March suggested, for your series just implement the hacky arch opt-out, don't
try and do surgery at this time as that's likely going to be a months-long effort
that touches a lot of cross-arch code.
E.g. I believe the ARM opt-out (opt-in?) for the above hack would be
bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
{
return vgic_has_its(kvm);
}
Ok, Thanks for your confirm. v7 was just posted to address comments from Marc,
Peter, Oliver and you. Please help to review when you get a chance.
https://lore.kernel.org/kvmarm/20221031003621.164306-1-gshan@xxxxxxxxxx/T/#t
Thanks,
Gavin