On Mon, 20 Aug 2012 22:41:04 +0100, Peter Maydell <peter.maydell@xxxxxxxxxx> wrote: > On 20 August 2012 22:19, Christoffer Dall <c.dall@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, Aug 20, 2012 at 5:55 AM, Peter Maydell <peter.maydell@xxxxxxxxxx> wrote: > >> As I've been working through trying to drive kvm cp15 save/load > >> via qemu's cpreg hashtable, I ran into an interesting question. > >> > >> Some bits of cp15 don't trivially expose their state as a single > >> register. For example, the cache status registers have a number > >> of different values which are accessed via writing to a 'select' > >> register to pick which of them is visible in the 'value' register. > >> Some of the perf registers come in "write 1 to set, 0 is ignored" > >> and "write 1 to clear, 0 is ignored" pairs. > >> > >> The current kernel ABI just exposes these as registers the same > >> way the hardware does. Userspace can use this to read/write > >> the state, but it would have to do it via a sequence of "write > >> select reg; read value reg; write select reg; read value reg..." > >> which defeats the intention of the 'many regs' ABI slightly. > >> > >> I guess fundamentally somebody has to do the mapping from the > >> register interface the hw provides to "slurp the state in > >> and out", so the question is whether it should be userspace > >> (as now) or the kernel... > >> > >> I don't have a concrete suggestion for this yet, I just thought > >> I'd raise the issue in case somebody else had a good idea. > > > > Is there a current need (or reasonably foreseeable future one) to read > > these registers from user space? If so, I think the multi-kernel > > crossing approach sounds awful, but an interface for this sounds on > > the other hand complicated and annoying. I guess to me awful trumps > > complicated and annoying and the kernel should do the work, but I sure > > hope it's not a priority. > > Well, typically guests don't expect the cache architecture to > change under their feet, so if you have a heterogenous set of > servers (A15 and something-elses) which you want to be able to > migrate your VMs between, you're going to need to present the > guest with the least-common-denominator for things like cache > line size, which means faking the return values from these registers > on at least one set of hosts. So we don't need it now when all > we support is A15, but I wouldn't be surprised if we needed it > later. > > The perf registers case we'll want if and when we get around > to supporting virtualised performance counters. > > I haven't yet gone through to see if there are any others that > have non-simple save/load requirements, those are just the first > two examples. > > One random idea I had was that we could continue to use the > GET/SET_MSRS API, but we define some part of the space (eg > top half of index == 0x20) for more convenient direct access > to this state. Certainly for the cache registers, we'd only > need to actually implement this if the kernel supported > showing the guest fake values, in which case it will have the > fake values in some bit of its structures and use these > when handling the cache registers on guest access traps. In > that case actually doing the save/load bit would be pretty > trivial (in fact easier than handling userspace accesses > via fiddling the select register!). Peter, I do enjoy getting up in the morning and reading these emails from you. You ask the nastiest questions: it invariably makes me fill out travel expenses and other administrivia to avoid the headache of answering :) We still want to implement it to save the state. That's been the theory so far: save everything, even if it's not fungible today. To be concrete, for CSSELR/CSSIDR, we define a new coproc index: #define KVM_ARM_MSR_CSSIDR_DEMUX 0x20 Then we use the bottom 4 bits of the CSSELR to index into this, eg. 0x0020 0x0000 gives the CCSIDR when CSSELR == 0. If you agree, I'll patch it. For performance monitors, yes, they are turned off and on in weird ways (PMCNTENSET/PMCNTENCLR). I can't quite see why it is this way? Is it to avoid having to do a read/write cycle to flip a single counter? I think here we'd not expose the PMCNTENCLR reg, as it's implied by the PMCNTENSET reg, *and* we'd write absolute ON/OFF values rather than the 0-is-ignored stuff. Cheers, Rusty. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm