Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree

Zachary Amsden <zach@xxxxxxxxxx> · Thu, 08 Mar 2007 00:01:13 -0800

Thomas Gleixner wrote:
> On Wed, 2007-03-07 at 17:01 -0800, Daniel Arai wrote:
>   
>> But more importantly, we want a kernel that can run both on native hardware and 
>> in a paravirtualized environment.  Linux doesn't really provide abstractions for 
>>   replacing the appropriate code.  We tried to hook into the source code at a 
>> level that seemed possible.
>>     
>
> Again. You just refuse to change your implementation and you want to
> keep it by arguing how hard it is because there are no abstractions.
>   

It is no longer possible to change our _hypervisor_ implementation.  The 
Linux side of our code is entirely flexible, and we are trying to change 
it, but it hasn't always been clear what you want us to do.

> Your prayer wheel argument of missing abstractions and easiness of
> emulating things is annoying. If you think it is better to emulate APIC,
> please emulate it without paravirt ops. If you want the speed
> improvement, work with us to create the interfaces and abstractions
> which are necessary to have a sane, maintainable and useful for all
> hypervisors implementation.
>   

That's what we are doing.  Our prayer wheel would be easier appeased if 
you actually told us which parts of the VMI timer you objected to.  As I 
understand it now:

1) We should not call into external functions in other time sources; any 
common code should be merged up
2) We should not be using global_clock_event; it is a horrible hack 
which you want to remove
3) We should not use the smp_apic_timer_interrupt assembly code which 
calls up to the lapic timer handlers
4) We should not add our own assembly code to call out to a local timer 
handler (from Ingo)

These last two points create a conflict which is a little tricky to 
solve.  We can't add our own custom timer handler, and we can't re-use 
the APIC timer handler.  But there is no timer handler available on i386 
that works, since the handlers will fall back to either PIC or IO-APIC 
edge handling.  Using either of those for the local timer interrupt on 
SMP does not work because they assume traditional IRQ semantics - an IRQ 
raised from the bus should be serviced by one processor.  Re-raises of 
the same IRQ on remote processors are locked out by the handler, and 
dropped.  Thus simultaneous local timers firing on multiple CPUs cause 
only one to be serviced.

This does not work for local timer interrupts in NO_HZ mode, because 
they must always be serviced so that they can reschedule the next local 
timer.  I have a proposed solution to this issue, but it fails to work 
when the IO-APIC assumes control of all IRQs based on ACPI results 
(which we control, but can't change because of compatibility issues with 
other operating systems).

My proposal is to keep IRQ-0 as the timer interrupt, on all CPUs, but 
fire it from the LAPIC after local apic timers get initialized.  We 
would do this by converting the irq handler using set_irq_handler(0, 
handle_percpu_irq).  The only problem is the IO-APIC code will want to 
take over IRQ0 and convert it to an edge triggered IO-APIC interrupt.  
But for the local irq handlers to work, we have to keep them using the 
handle_percpu_irq handler, and can't let the IO-APIC steal these 
vectors.  There is no way to do conditionally for just a specific set of 
IRQs in tree today, so we would need to add a special case to io_apic.c 
to allow early boot code to reserve specific vectors so they are not 
subsumed by the IO-APIC.  This seems reasonable, but is a special case.

If, on the other hand, we are allowed to use our own assembly code to 
call out to our local timer handler (dropping constraint #4 above), we 
can simply rewire LOCAL_TIMER_VECTOR to point to this code, but now we 
must emulate the semantics of irq_enter / leave / etc inside our code, 
which is also not the cleanest solution.  We used to do this, and it 
caught flak I believe from Ingo.

The basic problem is that a local IRQ doesn't behave like a global IRQ, 
and the i386 backend is unaware of how to set up any local IRQs except 
in the case of local APIC, but you have told us we should not re-use the 
APIC handlers by overloading global_clock_event.  The patches we sent 
out recently did just this, but seemed to meet even more violence than 
our previous way of doing things.

So the question is, which approach do you prefer?

Zach
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxx
https://lists.osdl.org/mailman/listinfo/virtualization