On Mon, 7 Feb 2011 16:43:02 +0100 Alexander Graf <agraf@xxxxxxx> wrote: > On 04.02.2011, at 23:33, Scott Wood wrote: > > > On Thu, 3 Feb 2011 10:19:06 +0100 > > Alexander Graf <agraf@xxxxxxx> wrote: > > > >> Makes sense. So we basically need an ioctl that tells KVM the MMU type and TLB size. Remember, the userspace tool is the place for policies :). > > > > Maybe, though keeping it in KVM means we can change it whenever we want > > without having to sync up Qemu and worry about backward compatibility. > > Quite the contrary - you have to worry more about backward compatibility. If we implement a new feature that doesn't work on old kernels, we can just tell qemu to not work on those old versions. For the kernel interfaces, we have to keep supporting old userspace. If you're talking about actual interface changes, yes. But a change in how KVM implements things behind the scenes shouldn't break an old Qemu, unless it's buggy and makes assumptions not permitted by the API. > > How's that different from backing the void pointer up with a different > > struct depending on the MMU type? We weren't proposing unions. > > > > A fixed array does mean you wouldn't have to worry about whether qemu > > supports the more advanced struct format if fields are added -- > > you can just unconditionally write it, as long as it's backwards > > compatible. Unless you hit the limit of the pre-determined array size, > > that is. And if that gets made higher for future expansion, that's > > even more data that has to get transferred, before it's really needed. > > Yes, it is. And I don't see how we could easily avoid it. Maybe just pass in a random __user pointer that we directly write to from kernel space and tell qemu how big and what type a tlb entry is? > > struct request_ppc_tlb { > int tlb_type; > int tlb_entries; > uint64_t __user *tlb_data > }; That's pretty much what the proposed API does -- except it uses a void pointer instead of uint64_t *. > Would you really want to loop through 16k entries, doing an ioctl for each? Not really. The API was modeled after something we did on Topaz where it's just a function call. But something array-based would have been awkward without constraining the geometry. Now that we're going to constrain the geometry, providing an array-based get/set would be easy and should definitely be a part of the API. > Then performance really would always be an issue. For cases where you really need to do a full get/set, yes. > I would really prefer we tackle it with a full-on tlb get/set first and then put the very flexible one on top, because to be the full-on approach feels like the more generic one. I'm very open to adding an individual tlb get/set and maybe even a "kvm, please translate EA x to RA y" ioctl. But those should come after we cover the big hammer that just copies everything. If we add everything at once, it minimizes the possibilities that Qemu has to deal with -- either the full MMU API is there, or it's not. BTW, I wonder if we should leave PPC out of the name. It seems like any arch with a software-visible TLB could use this, since the hw details are hidden behind the MMU type. How about: struct kvmppc_booke_tlb_entry { union { __u64 mas0_1; struct { __u32 mas0; __u32 mas1; }; }; __u64 mas2; union { __u64 mas7_3 struct { __u32 mas7; __u32 mas3; }; }; __u32 mas8; __u32 pad; }; struct kvmppc_booke_tlb_params { /* * book3e defines 4 TLBs. Individual implementations may have * fewer. TLBs that do not exist on the target must be configured * with a size of zero. KVM will adjust TLBnCFG based on the sizes * configured here, though arrays greater than 2048 entries will * have TLBnCFG[NENTRY] set to zero. */ __u32 tlb_sizes[4]; }; struct kvmppc_booke_tlb_search { struct kvmppc_booke_tlb_entry entry; union { __u64 mas5_6; struct { __u64 mas5; __u64 mas6; }; }; }; For a mmu type of PPC_BOOKE_NOHV, the mas5 field in kvmppc_booke_tlb_search and the mas8 field in kvmppc_booke_tlb_entry are present but not supported. For an MMU type of PPC_BOOKE_NOHV or PPC_BOOKE_HV: - TLB entries in get/set arrays may be provided in any order, and all TLBs are get/set at once. - An entry with MAS1[V] = 0 terminates the list early (but there will be no terminating entry if the full array is valid). On a call to KVM_GET_TLB, the contents of elemnts after the terminator are undefined. On a call to KVM_SET_TLB, excess elements beyond the terminating entry may not be accessed by KVM. [Note: Once we implement sregs, Qemu can determine which TLBs are implemented by reading MMUCFG/TLBnCFG -- but in no case should a TLB be unsupported by KVM if its existence is implied by the target CPU] KVM_SET_TLB ----------- Capability: KVM_CAP_SW_TLB Type: vcpu ioctl Parameters: struct kvm_set_tlb (in) Returns: 0 on success -1 on error struct kvm_set_tlb { __u64 params; __u64 array; __u32 mmu_type; }; [Note: I used __u64 rather than void * to avoid the need for special compat handling with 32-bit userspace on a 64-bit kernel -- if the other way is preferred, that's fine with me] Configures and sets the virtual CPU's TLB array. The "params" and "array" fields are userspace addresses of mmu-type-specific data structures. For mmu types PPC_BOOKE_NOHV and PPC_BOOKE_HV, the "params" field is of type "struct kvmppc_booke_tlb_params", and the "array" field points to an array of type "struct kvmppc_booke_tlb_entry". [Note: KVM_SET_TLB with early array termination makes a separate invalidate call unnecessary.] KVM_GET_TLB ----------- Capability: KVM_CAP_SW_TLB Type: vcpu ioctl Parameters: void pointer (out) Returns: 0 on success -1 on error Reads the TLB array from a virtual CPU. A successful call to KVM_SET_TLB must have been previously made on this vcpu. The argument must point to space for an array of the size and type of TLB entry structs configured by the most recent successful call to KVM_SET_TLB. For mmu types BOOKE_NOHV and BOOKE_HV, the array is of type "struct kvmppc_booke_tlb_entry", and must hold a number of entries equal to the sum of the elements of tlb_sizes in the most recent successful TLB configuration call. KVM_SEARCH_TLB -------------- Capability: KVM_CAP_SW_TLB Type: vcpu ioctl Parameters: void pointer (in/out) Returns: 0 on success -1 on error Searches the TLB array of a virtual CPU for an entry matching mmu-type-specific parameters. A successful call to KVM_SET_TLB must have been previously made on this vcpu. For mmu types BOOKE_NOHV and BOOKE_HV, the argument must point to a struct of type "kvmppc_booke_tlb_search". The search operates as a tlbsx instruction. [Note: there currently exists a BookE implementation of KVM_TRANSLATE, but the way it interprets the address is broken on 64-bit, and seems to be confusing PPC's notion of a virtual address with what was most likely intended by the x86ish API. If nothing yet uses it to be broken by a change, we may want to reimplement that using the address input as just an effective address, as it would be interpreted by the current state of the vcpu. This would be a way to provide a non-hw-specific mechanism for simple virtual->physical translation for debuggers and such, as long as the caller doesn't care about the x86ish attributes.]. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html