Hi Peter and Marc,
On 9/28/22 4:21 AM, Peter Xu wrote:
On Tue, Sep 27, 2022 at 01:32:07PM -0400, Marc Zyngier wrote:
On Tue, 27 Sep 2022 12:02:52 -0400,
Peter Xu <peterx@xxxxxxxxxx> wrote:
[...]
Any decision made on how to tackle with the GIC status dirty bits?
Which dirty bits? Are you talking of the per-RD pending bits?
Gavin found that some dirty pfn path may not have vcpu context for aarch64
offlist.
Borrowing Gavin's trace dump:
el0t_64_sync
el0t_64_sync_handler
el0_svc
do_el0_svc
__arm64_sys_ioctl
kvm_device_ioctl
vgic_its_set_attr
vgic_its_save_tables_v0
kvm_write_guest
__kvm_write_guest_page
mark_page_dirty_in_slot
With current code it'll trigger the warning in mark_page_dirty_in_slot.
An userspace approach is doable by setting these pages as always dirty in
userspace (QEMU), but even if so IIUC we'll need to drop the warning
message in mark_page_dirty_in_slot() then we take no-vcpu dirty as no-op
and expected.
I'll leave the details to Gavin.
Thanks to Peter for bringing the issue to here for further discussion. I'm
slowly approaching the idea/design to resolve the issue. If Marc agrees, It
wouldn't stop us to merge the series since the newly introduced capability
is only used by kvm/selftests/dirty_log_test. It means more changes are needed
by QEMU in order to enable the feature there through KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
I can post follow-up patches to fix the VGIC/ITS issue.
We had some internal discussion about the mentioned issue. It's all about the
various VGIC/ITS tables, listed as below. In QEMU, those tables are requested
to be saved to memory for migration or shutdown. Unfortunately, there is no
running vcpu in this particular path and the corresponding dirty bits are
dropped with warning messages in mark_page_dirty_in_slot().
---> Command to save VGIC/ITS tables
group: KVM_DEV_ARM_VGIC_GRP_CTRL
attr : KVM_DEV_ARM_ITS_SAVE_TABLES
---> VGIC/ITS tables to be saved
Device Table // REG_GITS_BASER_0
Interrupt Translation Tables // GITS_CMD_MAPD
Collection Table // REG_GITS_BASER_1
---> QEMU's backtraces, triggering the issue
main thread_start
qemu_main start_thread
qemu_cleanup qemu_thread_start
vm_shutdown migration_thread
do_vm_stop migration_iteration_run
vm_state_notify migration_completion
vm_change_state_handler vm_stop_force_state
do_vm_stop
vm_state_notify
vm_change_state_handler // In hw/intc/arm_gicv3_its_kvm.c
I have rough idea as below. It's appreciated if you can comment before I'm
going a head for the prototype. The overall idea is to introduce another
dirty ring for KVM (kvm-dirty-ring). It's updated and visited separately
to dirty ring for vcpu (vcpu-dirty-ring).
- When the various VGIC/ITS table base addresses are specified, kvm-dirty-ring
entries are added to mark those pages as 'always-dirty'. In mark_page_dirty_in_slot(),
those 'always-dirty' pages will be skipped, no entries pushed to vcpu-dirty-ring.
- Similar to vcpu-dirty-ring, kvm-dirty-ring is accessed from userspace through
mmap(kvm->fd). However, there won't have similar reset interface. It means
'struct kvm_dirty_gfn::flags' won't track any information as we do for
vcpu-dirty-ring. In this regard, kvm-dirty-ring is purely shared buffer to
advertise 'always-dirty' pages from host to userspace.
- For QEMU, shutdown/suspend/resume cases won't be concerning us any more. The
only concerned case is migration. When the migration is about to complete,
kvm-dirty-ring entries are fetched and the dirty bits are updated to global
dirty page bitmap and RAMBlock's dirty page bitmap. For this, I'm still reading
the code to find the best spot to do it.
Thanks,
Gavin