Re: handling cp15 state which isn't trivially exposed as a single register

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 20 Aug 2012 22:41:04 +0100, Peter Maydell <peter.maydell@xxxxxxxxxx> wrote:
> On 20 August 2012 22:19, Christoffer Dall <c.dall@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> > On Mon, Aug 20, 2012 at 5:55 AM, Peter Maydell <peter.maydell@xxxxxxxxxx> wrote:
> >> As I've been working through trying to drive kvm cp15 save/load
> >> via qemu's cpreg hashtable, I ran into an interesting question.
> >>
> >> Some bits of cp15 don't trivially expose their state as a single
> >> register. For example, the cache status registers have a number
> >> of different values which are accessed via writing to a 'select'
> >> register to pick which of them is visible in the 'value' register.
> >> Some of the perf registers come in "write 1 to set, 0 is ignored"
> >> and "write 1 to clear, 0 is ignored" pairs.
> >>
> >> The current kernel ABI just exposes these as registers the same
> >> way the hardware does. Userspace can use this to read/write
> >> the state, but it would have to do it via a sequence of "write
> >> select reg; read value reg; write select reg; read value reg..."
> >> which defeats the intention of the 'many regs' ABI slightly.
> >>
> >> I guess fundamentally somebody has to do the mapping from the
> >> register interface the hw provides to "slurp the state in
> >> and out", so the question is whether it should be userspace
> >> (as now) or the kernel...
> >>
> >> I don't have a concrete suggestion for this yet, I just thought
> >> I'd raise the issue in case somebody else had a good idea.
> >
> > Is there a current need (or reasonably foreseeable future one) to read
> > these registers from user space? If so, I think the multi-kernel
> > crossing approach sounds awful, but an interface for this sounds on
> > the other hand complicated and annoying. I guess to me awful trumps
> > complicated and annoying and the kernel should do the work, but I sure
> > hope it's not a priority.
> 
> Well, typically guests don't expect the cache architecture to
> change under their feet, so if you have a heterogenous set of
> servers (A15 and something-elses) which you want to be able to
> migrate your VMs between, you're going to need to present the
> guest with the least-common-denominator for things like cache
> line size, which means faking the return values from these registers
> on at least one set of hosts. So we don't need it now when all
> we support is A15, but I wouldn't be surprised if we needed it
> later.
> 
> The perf registers case we'll want if and when we get around
> to supporting virtualised performance counters.
> 
> I haven't yet gone through to see if there are any others that
> have non-simple save/load requirements, those are just the first
> two examples.
> 
> One random idea I had was that we could continue to use the
> GET/SET_MSRS API, but we define some part of the space (eg
> top half of index == 0x20) for more convenient direct access
> to this state. Certainly for the cache registers, we'd only
> need to actually implement this if the kernel supported
> showing the guest fake values, in which case it will have the
> fake values in some bit of its structures and use these
> when handling the cache registers on guest access traps. In
> that case actually doing the save/load bit would be pretty
> trivial (in fact easier than handling userspace accesses
> via fiddling the select register!).

Peter, I do enjoy getting up in the morning and reading these emails
from you.  You ask the nastiest questions: it invariably makes me
fill out travel expenses and other administrivia to avoid the headache
of answering :)

We still want to implement it to save the state.  That's been the theory
so far: save everything, even if it's not fungible today.

To be concrete, for CSSELR/CSSIDR, we define a new coproc index:

#define KVM_ARM_MSR_CSSIDR_DEMUX        0x20

Then we use the bottom 4 bits of the CSSELR to index into this,
eg. 0x0020 0x0000 gives the CCSIDR when CSSELR == 0.  If you agree, I'll
patch it.

For performance monitors, yes, they are turned off and on in weird ways
(PMCNTENSET/PMCNTENCLR).  I can't quite see why it is this way?  Is it
to avoid having to do a read/write cycle to flip a single counter?

I think here we'd not expose the PMCNTENCLR reg, as it's implied by the
PMCNTENSET reg, *and* we'd write absolute ON/OFF values rather than the
0-is-ignored stuff.

Cheers,
Rusty.
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm


[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux