On 14.11.2012, at 05:33, Paul Mackerras wrote: > A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on > this fd return the contents of the HPT (hashed page table), writes > create and/or remove entries in the HPT. There is a new capability, > KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl > takes an argument structure with the index of the first HPT entry to > read out and a set of flags. The flags indicate whether the user is > intending to read or write the HPT, and whether to return all entries > or only the "bolted" entries (those with the bolted bit, 0x10, set in > the first doubleword). > > This is intended for use in implementing qemu's savevm/loadvm and for > live migration. Therefore, on reads, the first pass returns information > about all HPTEs (or all bolted HPTEs). When the first pass reaches the > end of the HPT, it returns from the read. Subsequent reads only return > information about HPTEs that have changed since they were last read. > A read that finds no changed HPTEs in the HPT following where the last > read finished will return 0 bytes. > > The format of the data provides a simple run-length compression of the > invalid entries. Each block of data starts with a header that indicates > the index (position in the HPT, which is just an array), the number of > valid entries starting at that index (may be zero), and the number of > invalid entries following those valid entries. The valid entries, 16 > bytes each, follow the header. The invalid entries are not explicitly > represented. > > Signed-off-by: Paul Mackerras <paulus@xxxxxxxxx> > --- > Documentation/virtual/kvm/api.txt | 53 +++++ > arch/powerpc/include/asm/kvm_book3s_64.h | 18 ++ > arch/powerpc/include/asm/kvm_ppc.h | 2 + > arch/powerpc/include/uapi/asm/kvm.h | 24 +++ > arch/powerpc/kvm/book3s_64_mmu_hv.c | 344 ++++++++++++++++++++++++++++++ > arch/powerpc/kvm/book3s_hv.c | 12 -- > arch/powerpc/kvm/powerpc.c | 17 ++ > include/uapi/linux/kvm.h | 3 + > 8 files changed, 461 insertions(+), 12 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > index 6671fdc..33080ea 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -2071,6 +2071,59 @@ KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm > > Note that the vcpu ioctl is asynchronous to vcpu execution. > > +4.78 KVM_PPC_GET_HTAB_FD > + > +Capability: KVM_CAP_PPC_HTAB_FD > +Architectures: powerpc > +Type: vm ioctl > +Parameters: Pointer to struct kvm_get_htab_fd (in) > +Returns: file descriptor number (>= 0) on success, -1 on error > + > +This returns a file descriptor that can be used either to read out the > +entries in the guest's hashed page table (HPT), or to write entries to > +initialize the HPT. The returned fd can only be written to if the > +KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and > +can only be read if that bit is clear. The argument struct looks like > +this: > + > +/* For KVM_PPC_GET_HTAB_FD */ > +struct kvm_get_htab_fd { > + __u64 flags; > + __u64 start_index; > +}; > + > +/* Values for kvm_get_htab_fd.flags */ > +#define KVM_GET_HTAB_BOLTED_ONLY ((__u64)0x1) > +#define KVM_GET_HTAB_WRITE ((__u64)0x2) > + > +The `start_index' field gives the index in the HPT of the entry at > +which to start reading. It is ignored when writing. > + > +Reads on the fd will initially supply information about all > +"interesting" HPT entries. Interesting entries are those with the > +bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise > +all entries. When the end of the HPT is reached, the read() will > +return. If read() is called again on the fd, it will start again from > +the beginning of the HPT, but will only return HPT entries that have > +changed since they were last read. > + > +Data read or written is structured as a header (8 bytes) followed by a > +series of valid HPT entries (16 bytes) each. The header indicates how > +many valid HPT entries there are and how many invalid entries follow > +the valid entries. The invalid entries are not represented explicitly > +in the stream. The header format is: > + > +struct kvm_get_htab_header { > + __u32 index; > + __u16 n_valid; > + __u16 n_invalid; > +}; > + > +Writes to the fd create HPT entries starting at the index given in the > +header; first `n_valid' valid entries with contents from the data > +written, then `n_invalid' invalid entries, invalidating any previously > +valid entries found. > + > > 5. The kvm_run structure > ------------------------ > diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h > index 4ca4f25..dc0a78d 100644 > --- a/arch/powerpc/include/asm/kvm_book3s_64.h > +++ b/arch/powerpc/include/asm/kvm_book3s_64.h > @@ -243,4 +243,22 @@ static inline bool slot_is_aligned(struct kvm_memory_slot *memslot, > return !(memslot->base_gfn & mask) && !(memslot->npages & mask); > } > > +static inline unsigned long slb_pgsize_encoding(unsigned long psize) > +{ > + unsigned long senc = 0; > + > + if (psize > 0x1000) { > + senc = SLB_VSID_L; > + if (psize == 0x10000) > + senc |= SLB_VSID_LP_01; Is this always accurate? > + } > + return senc; > +} > + > +static inline int is_vrma_hpte(unsigned long hpte_v) > +{ > + return (hpte_v & ~0xffffffUL) == > + (HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16))); > +} > + > #endif /* __ASM_KVM_BOOK3S_64_H__ */ > diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h > index 609cca3..1ca31e9 100644 > --- a/arch/powerpc/include/asm/kvm_ppc.h > +++ b/arch/powerpc/include/asm/kvm_ppc.h > @@ -164,6 +164,8 @@ extern void kvmppc_bookehv_exit(void); > > extern int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu); > > +extern int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *); > + > /* > * Cuts out inst bits with ordering according to spec. > * That means the leftmost bit is zero. All given bits are included. > diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h > index b89ae4d..6518e38 100644 > --- a/arch/powerpc/include/uapi/asm/kvm.h > +++ b/arch/powerpc/include/uapi/asm/kvm.h > @@ -331,6 +331,30 @@ struct kvm_book3e_206_tlb_params { > __u32 reserved[8]; > }; > > +/* For KVM_PPC_GET_HTAB_FD */ > +struct kvm_get_htab_fd { > + __u64 flags; > + __u64 start_index; Please add some padding here, in case we need more information passed down. 16 bytes should be enough. The actual protocol looks quite good :). And if it doesn't work for us anymore, we can always bump it to a new protocol using the flags in the above ioctl. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html