Re: re-writing on powerpc

Alexander Graf <agraf@xxxxxxx> · Wed, 15 Dec 2010 01:13:18 +0100

On 15.12.2010, at 01:00, Scott Wood wrote:

> On Wed, 15 Dec 2010 00:29:40 +0100
> Alexander Graf <agraf@xxxxxxx> wrote:
> 
>> On 15.12.2010, at 00:17, Scott Wood wrote:
>> 
>>> So that once KVM has an interrupt to deliver, and sees that critical is
>>> engaged, it knows that the next magic page store will resolve things.
>>> Either it is a store to critical, and KVM can now deliver the
>>> interrupt -- or it is some other store (scratch or MSR itself) and thus
>>> int_pending has not yet been checked.
>>> 
>>> I don't think it would be a problem for live patching.  It just seems a
>>> bit icky.
>> 
>> Oh, because you'd only trap stores, but no writes? Yep, that would work.
> 
> "writes" or "loads"? :-)

s/writes/loads/ :). Sorry, it's 1am here :).

> 
>> I actually like that idea. It's probably the cleanest we can get away with without deep modifications of the guest. Single-step is always icky.
> 
> Well, there's another complication -- if we trap on the final store to
> end the critical section, the critical section won't actually be ended
> until after that instruction executes.  Which won't happen until we set
> the page to read/write and let it go.  So we'd have to look at the
> instruction to see what it's doing.

Yep, which is why I proposed the thing below. We'd have to emulate the uncrit store and then automatically inject the interrupt because we're not in the crit section anymore.

> 
>> Thinking about the whole thing - can't we create an "interrupt notification page"? Some page that is always mapped read-only when interrupts are available, but read-write when they're not? > Then we could just do an unconditional store after the crit section is done and everyone's happy.
> 
> I'd limit it to interrupts that were deferred due to critical,
> to avoid unnecessary MMU manipulation, and unnecessary traps when doing
> mtmsr/wrtee if there's an interrupt pending and old EE = new EE = zero
> (assuming the guest doesn't use a separate restore path for that case).

Hrm, we already do treat EE 0 -> 1 changes differently in the code:

        /* Check if we have to fetch an interrupt */
        lwz     r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
        cmpwi   r31, 0
        beq+    no_check

        /* Check if we may trigger an interrupt */
        andi.   r30, r30, MSR_EE
        beq     no_check

        SCRATCH_RESTORE

        /* Nag hypervisor */
kvm_emulate_mtmsrd_orig_ins:
        tlbsync

        b       kvm_emulate_mtmsrd_branch

So the only thing we need to do is to get rid of the MAGIC_INT check and unconditionally store a random register into the notification page (-2*PAGE_SIZE) instead of the tlbsync instruction (sure, things there are slightly more compllicated because we actually use the real mtmsr, but you get the point).

> 
> But otherwise sounds reasonable, if we're willing to change the
> interface that much.  Does it even need to be read-only, or could it be
> entirely unmapped when there's a pending interrupt?

I don't see why we shouldn't add this to the interface. But I'd leave this in the back of our heads for a couple more days so in case we end up coming up with an even better idea, we rather implement that one :)

It could be read-only or unmapped - doesn't really matter. It has to be a store, because otherwise we'd clobber a guest register. It could however be a single physical page shared by all guests. We don't care about the contents written into that page anyways.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html