On Tue, Aug 07, 2012 at 04:13:49PM +0300, Avi Kivity wrote: > On 08/07/2012 03:14 PM, David Gibson wrote: > > On Tue, Aug 07, 2012 at 11:46:35AM +0300, Avi Kivity wrote: > >> On 08/07/2012 04:32 AM, David Gibson wrote: > >> > On Tue, Aug 07, 2012 at 06:57:57AM +1000, Benjamin Herrenschmidt wrote: > >> >> On Mon, 2012-08-06 at 13:13 +1000, David Gibson wrote: > >> >> > So, I'm still trying to nut out the implications for H_CEDE, and think > >> >> > if there are any other hypercalls that might want to block the guest > >> >> > for a time. We were considering blocking H_PUT_TCE if qemu devices > >> >> > had active dma maps on the previously mapped iovas. I'm not sure if > >> >> > the discussions that led to the inclusion of the qemu IOMMU code > >> >> > decided that was wholly unnnecessary or just not necessary for the > >> >> > time being. > >> >> > >> >> For "sleeping hcalls" they will simply have to set exit_request to > >> >> complete the hcall from the kernel perspective, leaving us in a state > >> >> where the kernel is about to restart at srr0 + 4, along with some other > >> >> flag (stop or halt) to actually freeze the vcpu. > >> >> > >> >> If such an "async" hcall decides to return an error, it can then set > >> >> gpr3 directly using ioctls before restarting the vcpu. > >> > > >> > Yeah, I'd pretty much convinced myself of that by the end of > >> > yesterday. I hope to send patches implementing these fixes today. > >> > > >> > There are also some questions about why our in-kernel H_CEDE works > >> > kind of differently from x86's hlt instruction implementation (which > >> > comes out to qemu unless the irqchip is in-kernel as well). I don't > >> > think we have an urgent problem there though. > >> > >> It's the other way round, hlt sleeps in the kernel unless the irqchip is > >> not in the kernel. > > > > That's the same as what I said. > > I meant to stress that the normal way which other archs should emulate > is sleep-in-kernel. Ok. > > We never have irqchip in kernel (because we haven't written that yet) > > but we still sleep in-kernel for CEDE. I haven't spotted any problem > > with that, but now I'm wondering if there is one, since x86 don't do > > it in what seems like the analogous situation. > > > > It's possible this works because our decrementer (timer) interrupts > > are different at the core level from external interrupts coming from > > the PIC, and *are* handled in kernel, but I haven't actually followed > > the logic to work out if this is the case. > > > >> Meaning the normal state of things is to sleep in > >> the kernel (whether or not you have an emulated interrupt controller in > >> the kernel -- the term irqchip in kernel is overloaded for x86). > > > > Uh.. overloaded in what way. > > On x86, irqchip-in-kernel means that the local APICs, the IOAPIC, and > the two PICs are emulated in the kernel. Now the IOAPIC and the PICs > correspond to non-x86 interrupt controllers, but the local APIC is more > tightly coupled to the core. Interrupt acceptance by the core is an > operation that involved synchronous communication with the local APIC: > the APIC presents the interrupt, the core accepts it based on the value > of the interrupt enable flag and possible a register (CR8), then the > APIC updates the ISR and IRR. > > The upshot is that if the local APIC is in userspace, interrupts must be > synchronous with vcpu exection, so that KVM_INTERRUPT is a vcpu ioctl > and HLT is emulated in userspace (so that local APIC emulation can check > if an interrupt wakes it up or not). Sorry, still not 100% getting it. When the vcpu is actually running code, that synchronous communication must still be accomplished via the KVM_INTERRUPT ioctl, yes? So what makes HLT different, that the communication can't be accomplished in that case. > As soon as the local APIC is > emulated in the kernel, HLT can be emulated there as well, and > interrupts become asynchronous (KVM_IRQ_LINE, a vm ioctl). > > So irqchip_in_kernel, for most discussions, really means whether > interrupt queuing is synchronous or asynchronous. It has nothing to do > with the interrupt controllers per se. All non-x86 archs always have > irqchip_in_kernel() in this sense. > > Peter has started to fix up this naming mess in qemu. I guess we should > do the same for the kernel (except for ABIs) and document it, because it > keeps generating confusion. Ok. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html