On 10/31/2011 10:12 PM, Scott Wood wrote: > >> +4.59 KVM_DIRTY_TLB > >> + > >> +Capability: KVM_CAP_SW_TLB > >> +Architectures: ppc > >> +Type: vcpu ioctl > >> +Parameters: struct kvm_dirty_tlb (in) > >> +Returns: 0 on success, -1 on error > >> + > >> +struct kvm_dirty_tlb { > >> + __u64 bitmap; > >> + __u32 num_dirty; > >> +}; > > > > This is not 32/64 bit safe. e500 is 32-bit only, yes? > > e5500 is 64-bit -- we don't support it with KVM yet, but it's planned. > > > but what if someone wants to emulate an e500 on a ppc64? maybe it's better to add > > padding here. > > What is unsafe about it? Are you picturing TLBs with more than 4 > billion entries? sizeof(struct kvm_tlb_dirty) == 12 for 32-bit userspace, but == 16 for 64-bit userspace and the kernel. ABI structures must have the same alignment and size for 32/64 bit userspace, or they need compat handling. > There shouldn't be any alignment issues. > > > Another alternative is to drop the num_dirty field (and let the kernel > > compute it instead, shouldn't take long?), and have the third argument > > to ioctl() reference the bitmap directly. > > The idea was to make it possible for the kernel to apply a threshold > above which it would be better to ignore the bitmap entirely and flush > everything: > > http://www.spinics.net/lists/kvm/msg50079.html > > Currently we always just flush everything, and QEMU always says > everything is dirty when it makes a change, but the API is there if needed. Right, but you don't need num_dirty for it. There are typically only a few dozen entries, yes? It should take a trivial amount of time to calculate its weight. > >> +Configures the virtual CPU's TLB array, establishing a shared memory area > >> +between userspace and KVM. The "params" and "array" fields are userspace > >> +addresses of mmu-type-specific data structures. The "array_len" field is an > >> +safety mechanism, and should be set to the size in bytes of the memory that > >> +userspace has reserved for the array. It must be at least the size dictated > >> +by "mmu_type" and "params". > >> + > >> +While KVM_RUN is active, the shared region is under control of KVM. Its > >> +contents are undefined, and any modification by userspace results in > >> +boundedly undefined behavior. > >> + > >> +On return from KVM_RUN, the shared region will reflect the current state of > >> +the guest's TLB. If userspace makes any changes, it must call KVM_DIRTY_TLB > >> +to tell KVM which entries have been changed, prior to calling KVM_RUN again > >> +on this vcpu. > > > > We already have another mechanism for such shared memory, > > mmap(vcpu_fd). x86 uses it for the coalesced mmio region as well as the > > traditional kvm_run area. Please consider using it. > > What does it buy us, other than needing a separate codepath in QEMU to > allocate the memory differently based on whether KVM (and this feature) The ability to use get_free_pages() and ordinary kernel memory directly, instead of indirection through a struct page ** array. > are being used, since QEMU uses this for its own MMU representation? > > This API has been discussed extensively, and the code using it is > already in mainline QEMU. This aspect of it hasn't changed since the > discussion back in February: > > http://www.spinics.net/lists/kvm/msg50102.html > > I'd prefer to avoid another round of major overhaul without a really > good reason. Me too, but I also prefer not to make ABI choices by inertia. ABI is practically the only thing I care about wrt non-x86 (other than whitespace, of course). Please involve me in the discussions earlier in the future. > >> +For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV: > >> + - The "params" field is of type "struct kvm_book3e_206_tlb_params". > >> + - The "array" field points to an array of type "struct > >> + kvm_book3e_206_tlb_entry". > >> + - The array consists of all entries in the first TLB, followed by all > >> + entries in the second TLB. > >> + - Within a TLB, entries are ordered first by increasing set number. Within a > >> + set, entries are ordered by way (increasing ESEL). > >> + - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1) > >> + where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value. > >> + - The tsize field of mas1 shall be set to 4K on TLB0, even though the > >> + hardware ignores this value for TLB0. > > > > Holy shit. > > You were the one that first suggested we use shared data: > http://www.spinics.net/lists/kvm/msg49802.html > > These are the assumptions needed to make such an interface well-defined. Just remarking on the complexity, don't take it personally. > >> @@ -95,6 +90,9 @@ struct kvmppc_vcpu_e500 { > >> u32 tlb1cfg; > >> u64 mcar; > >> > >> + struct page **shared_tlb_pages; > >> + int num_shared_tlb_pages; > >> + > > > > I missed the requirement that things be page aligned. > > They don't need to be, we'll ignore the data before and after the shared > area. > > > If you use mmap(vcpu_fd) this becomes simpler; you can use > > get_free_pages() and have a single pointer. You can also use vmap() on > > this array (but get_free_pages() is faster). > > We do use vmap(). This is just the bookkeeping so we know what pages to > free later. > Ah, I missed that (and the pointer). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html