On Mon, 07 Nov 2022 10:45:34 +0000, Gavin Shan <gshan@xxxxxxxxxx> wrote: > > Hi Marc, Peter, Oliver and Sean, > > On 11/5/22 7:40 AM, Gavin Shan wrote: > > ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is > > enabled. It's conflicting with that ring-based dirty page tracking always > > requires a running VCPU context. > > > > Introduce a new flavor of dirty ring that requires the use of both VCPU > > dirty rings and a dirty bitmap. The expectation is that for non-VCPU > > sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to > > the dirty bitmap. Userspace should scan the dirty bitmap before migrating > > the VM to the target. > > > > Use an additional capability to advertise this behavior. The newly added > > capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before > > KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added > > capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL. > > > > Suggested-by: Marc Zyngier <maz@xxxxxxxxxx> > > Suggested-by: Peter Xu <peterx@xxxxxxxxxx> > > Co-developed-by: Oliver Upton <oliver.upton@xxxxxxxxx> > > Signed-off-by: Oliver Upton <oliver.upton@xxxxxxxxx> > > Signed-off-by: Gavin Shan <gshan@xxxxxxxxxx> > > Acked-by: Peter Xu <peterx@xxxxxxxxxx> > > --- > > Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++----- > > include/linux/kvm_dirty_ring.h | 7 +++++ > > include/linux/kvm_host.h | 1 + > > include/uapi/linux/kvm.h | 1 + > > virt/kvm/Kconfig | 8 ++++++ > > virt/kvm/dirty_ring.c | 10 +++++++ > > virt/kvm/kvm_main.c | 49 +++++++++++++++++++++++++++------- > > 7 files changed, 93 insertions(+), 16 deletions(-) > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index eee9f857a986..2ec32bd41792 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one > > needs to kick the vcpu out of KVM_RUN using a signal. The resulting > > vmexit ensures that all dirty GFNs are flushed to the dirty rings. > > -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding > > -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls > > -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG. After enabling > > -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual > > -machine will switch to ring-buffer dirty page tracking and further > > -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail. > > - > > NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that > > should be exposed by weakly ordered architecture, in order to indicate > > the additional memory ordering requirements imposed on userspace when > > @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to > > expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL > > to userspace. > > +After using the dirty rings, the userspace needs to detect the > > capability > > +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures > > +need to be backed by per-slot bitmaps. With this capability advertised > > +and supported, it means the architecture can dirty guest pages without > > +vcpu/ring context, so that some of the dirty information will still be > > +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP > > +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL > > +has been enabled. > > + > > +Note that the bitmap here is only a backup of the ring structure, and > > +normally should only contain a very small amount of dirty pages, which > > +needs to be transferred during VM downtime. Collecting the dirty bitmap > > +should be the very last thing that the VMM does before transmitting state > > +to the target VM. VMM needs to ensure that the dirty state is final and > > +avoid missing dirty pages from another ioctl ordered after the bitmap > > +collection. > > + > > +To collect dirty bits in the backup bitmap, the userspace can use the > > +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed > > +and its behavior is undefined since collecting the dirty bitmap always > > +happens in the last phase of VM's migration. > > + > > +NOTE: One example of using the backup bitmap is saving arm64 vgic/its > > +tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on > > +KVM device "kvm-arm-vgic-its" during VM's migration. > > + > > In order to speed up the review and reduce unnecessary respins. After > collecting comments on PATCH[v8 3/7] from Marc and Peter, I would change > above description as below. Could you please confirm it looks good to you? > > In the 4th paragraph, the words starting from "Collecting the dirty bitmap..." > to the end, was previously suggested by Oliver, even Marc suggested to avoid > mentioning "migration". > > After enabling the dirty rings, the userspace needs to detect the > capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring > structures need to be backed by per-slot bitmaps. With this capability s/need/can/. If there was a *need*, it should happen automatically without user intervention. > advertised, it means the architecture can dirty guest pages without > vcpu/ring context, so that some of the dirty information will still be > maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP > can't be enabled if the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL > hasn't been enabled, or any memslot has been existing. > > Note that the bitmap here is only a backup of the ring structure. The > use of the ring and bitmap combination is only beneficial if there is > only a very small amount of memory that is dirtied out of vcpu/ring > context. Otherwise, the stand-alone per-slot bitmap mechanism needs to > be considered. > > To collect dirty bits in the backup bitmap, userspace can use the same > KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as long as all > the generation of the dirty bits is done in a single pass. Collecting > the dirty bitmap should be the very last thing that the VMM does before > transmitting state to the target VM. VMM needs to ensure that the dirty > state is final and avoid missing dirty pages from another ioctl ordered > after the bitmap collection. I would replace "transmitting state to the target VM" with "considering the state as complete", as I still object to casting this API into the migration mold. People use this stuff more far more than migration (checkpointing, for example). > > NOTE: One example of using the backup bitmap is saving arm64 vgic/its > tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on > KVM device "kvm-arm-vgic-its" during VM's migration. Same remark about migration. Thanks, M. -- Without deviation from the norm, progress is not possible.