Re: [PATCH] x86, nmi: workaround sti; hlt race vs nmi; intr

Avi Kivity <avi@xxxxxxxxxx> · Mon, 27 Sep 2010 16:17:15 +0200

 On 09/27/2010 12:31 PM, Joerg Roedel wrote:
On Sun, Sep 19, 2010 at 06:28:19PM +0200, Avi Kivity wrote:
>  On machines without monitor/mwait we use an sti; hlt sequence to atomically
>  enable interrupts and put the cpu to sleep.  The sequence uses the "interrupt
>  shadow" property of the sti instruction: interrupts are enabled only after
>  the instruction following sti has been executed.  This means an interrupt
>  cannot happen in the middle of the sequence, which would leave us with
>  the interrupt processed but the cpu halted.
>
>  The interrupt shadow, however, can be broken by an nmi; the following
>  sequence
>
>     sti
>       nmi ... iret
>       # interrupt shadow disabled
>       intr ... iret
>     hlt
>
>  puts the cpu to sleep, even though the interrupt may need additional
>  processing after the hlt (like scheduling a task).

Doesn't the interrupt return path check for a re-schedule condition
before iret? So to my believe the handler would not jump back to the
idle task if something else becomes running in the interrupt handler,
no?

Perhaps on preemptible kernels?  But at least on non-preemptible 
kernels, you can't just switch tasks while running kernel code.

void cpu_idle(void)
{
    current_thread_info()->status |= TS_POLLING;

    /*
     * If we're the non-boot CPU, nothing set the stack canary up
     * for us.  CPU0 already has it initialized but no harm in
     * doing it again.  This is a good place for updating it, as
     * we wont ever return from this function (so the invalid
     * canaries already on the stack wont ever trigger).
     */
    boot_init_stack_canary();

    /* endless idle loop with no priority at all */
    while (1) {
        tick_nohz_stop_sched_tick(1);
        while (!need_resched()) {

            rmb();

            if (cpu_is_offline(smp_processor_id()))
                play_dead();
            /*
             * Idle routines should keep interrupts disabled
             * from here on, until they go to idle.
             * Otherwise, idle callbacks can misfire.
             */
            local_irq_disable();
            enter_idle();
            /* Don't trace irqs off for idle */
            stop_critical_timings();
            pm_idle();
            start_critical_timings();

            trace_power_end(smp_processor_id());

            /* In many cases the interrupt that ended idle
               has already called exit_idle. But some idle
               loops can be woken up without interrupt. */
            __exit_idle();
        }

        tick_nohz_restart_sched_tick();
        preempt_enable_no_resched();
        schedule();
        preempt_disable();
    }
}

Looks like we rely on an explicit schedule() - pm_idle() is called with 
preemption disabled.

(pm_idle eventually calls safe_halt() if no other idle method is used)

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html