Re: Reset problem vs. MMIO emulation, hypercalls, etc...

David Gibson <dwg@xxxxxxxxxxx> · Wed, 8 Aug 2012 10:49:48 +1000

On Tue, Aug 07, 2012 at 04:13:49PM +0300, Avi Kivity wrote:
> On 08/07/2012 03:14 PM, David Gibson wrote:
> > On Tue, Aug 07, 2012 at 11:46:35AM +0300, Avi Kivity wrote:
> >> On 08/07/2012 04:32 AM, David Gibson wrote:
> >> > On Tue, Aug 07, 2012 at 06:57:57AM +1000, Benjamin Herrenschmidt wrote:
> >> >> On Mon, 2012-08-06 at 13:13 +1000, David Gibson wrote:
> >> >> > So, I'm still trying to nut out the implications for H_CEDE, and think
> >> >> > if there are any other hypercalls that might want to block the guest
> >> >> > for a time.  We were considering blocking H_PUT_TCE if qemu devices
> >> >> > had active dma maps on the previously mapped iovas.  I'm not sure if
> >> >> > the discussions that led to the inclusion of the qemu IOMMU code
> >> >> > decided that was wholly unnnecessary or just not necessary for the
> >> >> > time being.
> >> >> 
> >> >> For "sleeping hcalls" they will simply have to set exit_request to
> >> >> complete the hcall from the kernel perspective, leaving us in a state
> >> >> where the kernel is about to restart at srr0 + 4, along with some other
> >> >> flag (stop or halt) to actually freeze the vcpu.
> >> >> 
> >> >> If such an "async" hcall decides to return an error, it can then set
> >> >> gpr3 directly using ioctls before restarting the vcpu.
> >> > 
> >> > Yeah, I'd pretty much convinced myself of that by the end of
> >> > yesterday.  I hope to send patches implementing these fixes today.
> >> > 
> >> > There are also some questions about why our in-kernel H_CEDE works
> >> > kind of differently from x86's hlt instruction implementation (which
> >> > comes out to qemu unless the irqchip is in-kernel as well).  I don't
> >> > think we have an urgent problem there though.
> >> 
> >> It's the other way round, hlt sleeps in the kernel unless the irqchip is
> >> not in the kernel.
> > 
> > That's the same as what I said.
> 
> I meant to stress that the normal way which other archs should emulate
> is sleep-in-kernel.

Ok.

> > We never have irqchip in kernel (because we haven't written that yet)
> > but we still sleep in-kernel for CEDE.  I haven't spotted any problem
> > with that, but now I'm wondering if there is one, since x86 don't do
> > it in what seems like the analogous situation.
> > 
> > It's possible this works because our decrementer (timer) interrupts
> > are different at the core level from external interrupts coming from
> > the PIC, and *are* handled in kernel, but I haven't actually followed
> > the logic to work out if this is the case.
> > 
> >>  Meaning the normal state of things is to sleep in
> >> the kernel (whether or not you have an emulated interrupt controller in
> >> the kernel -- the term irqchip in kernel is overloaded for x86).
> > 
> > Uh.. overloaded in what way.
> 
> On x86, irqchip-in-kernel means that the local APICs, the IOAPIC, and
> the two PICs are emulated in the kernel.  Now the IOAPIC and the PICs
> correspond to non-x86 interrupt controllers, but the local APIC is more
> tightly coupled to the core.  Interrupt acceptance by the core is an
> operation that involved synchronous communication with the local APIC:
> the APIC presents the interrupt, the core accepts it based on the value
> of the interrupt enable flag and possible a register (CR8), then the
> APIC updates the ISR and IRR.
> 
> The upshot is that if the local APIC is in userspace, interrupts must be
> synchronous with vcpu exection, so that KVM_INTERRUPT is a vcpu ioctl
> and HLT is emulated in userspace (so that local APIC emulation can check
> if an interrupt wakes it up or not).

Sorry, still not 100% getting it.  When the vcpu is actually running
code, that synchronous communication must still be accomplished via
the KVM_INTERRUPT ioctl, yes?  So what makes HLT different, that the
communication can't be accomplished in that case.

> As soon as the local APIC is
> emulated in the kernel, HLT can be emulated there as well, and
> interrupts become asynchronous (KVM_IRQ_LINE, a vm ioctl).
> 
> So irqchip_in_kernel, for most discussions, really means whether
> interrupt queuing is synchronous or asynchronous.  It has nothing to do
> with the interrupt controllers per se.  All non-x86 archs always have
> irqchip_in_kernel() in this sense.
> 
> Peter has started to fix up this naming mess in qemu.  I guess we should
> do the same for the kernel (except for ABIs) and document it, because it
> keeps generating confusion.

Ok.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html