Re: [PATCH 1/11] Add generic helpers for arch IPI function calls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 23 2008, Mark Lord wrote:
> >>>Date:	Thu, 15 Nov 2007 12:07:48 -0500
> >>>From: Mark Lord <lkml@xxxxxx>
> >>>To: Greg KH <gregkh@xxxxxxx>
> >>>Cc: Yasunori Goto <y-goto@xxxxxxxxxxxxxx>,
> >>>	Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>,
> >>>	Alexey Dobriyan <adobriyan@xxxxx>, linux-kernel@xxxxxxxxxxxxxxx
> >>>Subject: Re: EIP is at device_shutdown+0x32/0x60
> >>>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >>>Content-Transfer-Encoding: 7bit
> >>>Sender: linux-kernel-owner@xxxxxxxxxxxxxxx
> >>>
> >>>... < snip > ...
> >>>
> >>>Greg, I don't know if this is relevant or not,
> >>>but x86 has bugs in the halt/reboot code for SMP.
> >>>
> >>>Specifically, in native_smp_send_stop() the code now uses
> >>>spin_trylock() to "lock" the shared call buffers,
> >>>but then ignores the result.
> >>>
> >>>This means that multiple CPUs can/will clobber each other
> >>>in that code.
> >>>
> >>>The second bug, is that this code does not wait for the
> >>>target CPUs to actually stop before it continues.
> >>>
> >>>This was the real cause of the failure-to-poweroff problems
> >>>I was having with 2.6.23, which we fixed by using CPU hotplug
> >>>to disable_nonboot_cpus() before the above code ever got run.
> 
> Jens Axboe wrote:
> >On Tue, Apr 22 2008, Mark Lord wrote:
> >>Jens,
> >>
> >>While you're in there, :)
> >>
> >>Could you perhaps fix this bug (above) if it still exists?
> >
> >I don't understand the bug - what are the shared call buffers you are
> >talking of?
> >
> >With the changes, there's not even an spin_trylock() in there anymore.
> >But I don't see the original bug either, so...
> ..
> 
> arch/x86/kernel/smp.c:
> 
> static void native_smp_send_stop(void)
> {
>        int nolock;
>        unsigned long flags;
> 
>        if (reboot_force)
>                return;
> 
>        /* Don't deadlock on the call lock in panic */
>        nolock = !spin_trylock(&call_lock);       <<<<<<<<<< buggy
>        local_irq_save(flags);
>        __smp_call_function(stop_this_cpu, NULL, 0, 0);
>        if (!nolock)
>                spin_unlock(&call_lock);
>        disable_local_APIC();
>        local_irq_restore(flags);
> }
> 
> The spinlock is trying to protect access to the global variable
> "call_data" (higher up in the same file), which is used
> in __smp_call_function() and friends.
> 
> But since the spinlock is ignored in this case,
> the global "call_data will get clobbered if it was already in-use.

Ah I see, that bug doesn't exist with the converted code.

> The second bug, is that for the halt case at least,
> nobody waits for the other CPU to actually halt
> before continuing.. so we sometimes enter the shutdown
> code while other CPUs are still active.
> 
> This causes some machines to hang at shutdown,
> unless CPU_HOTPLUG is configured and takes them offline
> before we get here.

I'm guessing there's a reason it doesn't pass '1' as the last argument,
because that would fix that issue?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux