On Fri, Jul 17, 2009 at 08:32:00PM +0300, Gleb Natapov wrote: > > There are a few issues that kvm_set_irq return code cannot handle, > > AFAICS (please correct me if i'm wrong, or provide useful points for > > discussion). > > > > case 1) > > 1. kvm_set_irq, delivered OK (IOW, assume guest has acked the > > interrupt, next_timer_event += timer_period). > > 2. hlt -> kvm_vcpu_block thinks it can sleep, because next injectable > > timer event is the future. > > 3. which might not be the case, since we are not certain the guest > > has acked the interrupt. > > > > case 2) > > 1. kvm_set_irq, delivered OK. > > 2. guest masks irq. > > 3. here you do no want to unhalt the vcpu on the next timer event, > > because irq is masked at irqchip level. > > > > So you need both ack notification and mask notification for proper > > emulation of UNHALT. Could check whether the interrupt is masked on > > injection and avoid UNHALT for case 2), though. > > > > But to check that, you need to look into both PIC and IOAPIC pins, which > > can happen like: > > > > pic_lock > > check if pin is masked > > pic_unlock > > > > ioapic_lock > > check if pin is masked > > ioapic_unlock > And what if interrupts are blocked by vcpu? You don't event know what > cpu will receive the interrupt (may be several?). Timers get bound to vcpus. If the IOAPIC is programmed to multiple vcpus, you bind one timer instance to each target vcpu. So it will need "target vcpu notifiers" (good point). > > And only then, if interrupt can reach the processor, UNHALT the vcpu. > > This could avoid the need to check for PIC/IOAPIC state inside the PIT / > > other timers mask notifiers (regarding the current locking discussion). > > > Timer is just a device that generates interrupt. You don't check in > a device emulation code if CPU is halted and should be unhalted and > timer code shouldn't do that either. The only check that decide if vcpu > should be unhalted is in kvm_arch_vcpu_runnable() and it goes like this: > if there is interrupt to inject and interrupt are not masked by vcpu or > there is nmi to inject then unhalt vcpu. > So what the timer code should do is to call kvm_set_irq(1) and if > interrupt (or nmi) will be delivered to vcpu as a result of this the > above check will unhalt vcpu. OK, can use kvm_arch_vcpu_runnable in the comparison. > If you want to decide in timer code what should be done to vcpu as a > result of a timer event you'll have to reimplement entire pic/ioapic/lapic > logic inside it. What if timer interrupt configured to send INIT to vcpu? It'll go through kvm_set_irq (expect for LAPIC timer). > Both cases you described are problematic only because you are trying to > make interrupt code too smart. > > > Those checks can race with another processor unmasking the pin for > > the irqchips, but then its a guest problem. > Sane OSse serialize access to pic/ioapic. I think we can assume this in > our code. > > > > > You still need the mask notifier to be able to reset the reinjection > > logic on unmask (see 4780c65904f0fc4e312ee2da9383eacbe04e61ea). Is that > > the case thats broken with RTC in QEMU? > > > Yes, interrupt masking is broken with RTC in QEMU, but only because QEMU > wanted it like this. In KVM if kvm_set_irq() returns zero it means that > interrupt was masked so reinjection logic can be reset. > > > Also, the fact that you receive an ack notification allows you to make > > decisions of behaviour between injection and ack. You can also measure > > the delay between these two events, which is not used at the moment > > but is potentially useful information. > You can use ack notifier for this, but it doesn't require to call back > into pic/ioapic code. Agreed. > > > > You do not have that information on AEOI style interrupts (such as timer > > delivery via PIT->LVT0 in NMI mode or PIC AEOI), but there's nothing > > you can to help such cases. > > > > > > (it could check whether a GSI is masked or not on both PIC/IOAPIC/LAPIC > > > > from a different context other than mask notifier). > > > > > > > > > But I think I found the way to not drop the lock here. > > > > > > > > See, this is adding more complexity than simplifying things (the > > > > recursive check). > > > This is not more complex than current code. (IMHO) > > > > Maybe there is a better way to get rid of that recursion check, maybe > > move the notifiers to be called after internal irqchip state has been > > updated, outside the lock? > > > Impressible (IOW I don't see how it could be done). The ack notifier is > the one who drives line low, so it should be called before irr is > checked for pending interrupts. Basically the same scenario I described > for pic in my previous mail. Flargunbastic. You're right. > > > had a chance to clear interrupt condition in a device. Or do I miss > > > something here? > > > > No, it looks wrong. The ack notification should be inside pic_clear_isr. > > > So you agree with me that the current code is completely broken? The > "no" at the beginning confuses me :) Yes its completly broken. > Otherwise I agree with you the ack notifiers should be inside > pic_clear_isr(). > > > And it is not there already because kvm_notify_acked_irq/kvm_vcpu_kick > > could not be called under spinlocks. > Why kvm_notify_acked_irq() cannot be called under spinlocks? > kvm_vcpu_kick can be called under spinlocks now. Agreed. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html