Re: RFC: New API for PPC for vcpu mmu access

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2 Feb 2011 22:33:41 +0100
Alexander Graf <agraf@xxxxxxx> wrote:

> 
> On 02.02.2011, at 21:33, Yoder Stuart-B08248 wrote:
> 
> > Below is a proposal for a new API for PPC to allow KVM clients
> > to set MMU state in a vcpu.
> > 
> > BookE processors have one or more software managed TLBs and
> > currently there is no mechanism for Qemu to initialize
> > or access them.  This is needed for normal initialization
> > as well as debug.
> > 
> > There are 4 APIs:
> > 
> > -KVM_PPC_SET_MMU_TYPE allows the client to negotiate the type
> > of MMU with KVM-- the type determines the size and format
> > of the data in the other APIs
> 
> This should be done through the PVR hint in sregs, no? Usually a single CPU type only has a single MMU type.

Well, for one, we don't have sregs or a PVR hint on Book E yet. :-)

But also, there could be differing levels of support -- e.g. on e500mc,
we have no plans to support exposing the hardware virtualization
features in a nested manner (nor am I sure that it's reasonably
possible).  But if someone does it, that would be a change in the
interface between Qemu and KVM to allow the extra fields to be set,
with no change in PVR.

Likewise, a new chip could introduce new capabilities, but still be
capable of working the old way.

Plus, basing it on PVR means Qemu needs to be updated every time
there's a new chip with a new PVR.

> > -KVM_PPC_INVALIDATE_TLB invalidates all TLB entries in all
> > TLBs in the vcpu
> > 
> > -KVM_PPC_SET_TLBE sets a TLB entry-- the Power architecture
> > specifies the format of the MMU data passed in
> 
> This seems to fine-grained. I'd prefer a list of all TLB entries to be pushed in either direction. What's the foreseeable number of TLB entries within the next 10 years?

I have no idea what things will look like 10 years down the road, but
currently e500mc has 576 entries (512 TLB0, 64 TLB1).

> Having the whole stack available would make the sync with qemu easier and also allows us to only do a single ioctl for all the TLB management. Thanks to the PVR we know the size of the TLB, so we don't have to shove that around.

No, we don't know the size (or necessarily even the structure) of the
TLB.  KVM may provide a guest TLB that is larger than what hardware has,
as a cache to reduce the number of TLB misses that have to go to the
guest (we do this now in another hypervisor).

Plus sometimes it's just simpler -- why bother halving the size of the
guest TLB when running on e500v1?

> > KVM_PPC_INVALIDATE_TLB
> > ----------------------
> > 
> > Capability: KVM_CAP_PPC_MMU
> > Architectures: powerpc
> > Type: vcpu ioctl
> > Parameters: none
> > Returns: 0 on success, -1 on error
> > 
> > Invalidates all TLB entries in all TLBs of the vcpu.
> 
> The only reason we need to do this is because there's no proper reset function in qemu for the e500 tlb. I'd prefer to have that there and push the TLB contents down on reset.

The other way to look at it is that there's no need for a reset
function if all the state is properly settable. :-)

Which we want anyway for debugging (and migration, though I wonder if
anyone would actually use that with embedded hardware).

> Haven't fully made up my mind on the tlb entry structure yet. Maybe something like
> 
> struct kvm_ppc_booke_tlbe {
>     __u64 data[8];
> };
> 
> would be enough? The rest is implementation dependent anyways. Exposing those details to user space doesn't buy us anything. By keeping it generic we can at least still build against older kernel headers :).

If it's not exposed to userspace, how is userspace going to
interpret/fill in the data?

As for kernel headers, I think qemu needs to provide its own copy, like
qemu-kvm does, and like http://kernelnewbies.org/KernelHeaders suggests
for programs which rely on recent kernel APIs (which Qemu+KVM tends
to do already).

> Userspace should only really need the TLB entries for
> 
>   1) Debugging
>   2) Migration
> 
> So I don't see the point in making the interface optimized for single TLB entries. Do you have other use cases in mind?

The third case is reset/init, which can be performance sensitive
(especially in failover setups).

And debugging can require single translations, and can be a
performance issue if you need to toss around several kilobytes of data
per translation, and a debugger is doing e.g. a automated pattern of
single step plus inspect memory.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux