Re: scheduling while atomic acpi_idle_enter_bm

"Luis R. Rodriguez" <mcgrof@xxxxxxxxx> · Thu, 5 Nov 2009 17:50:20 -0800

On Thu, Nov 5, 2009 at 5:23 PM, ykzhao <yakui.zhao@xxxxxxxxx> wrote:
> On Tue, 2009-11-03 at 11:09 +0800, Luis R. Rodriguez wrote:
>> On Mon, Nov 2, 2009 at 7:02 PM, Len Brown <lenb@xxxxxxxxxx> wrote:
>> >> > I get this when modprobing some module I am working on. I figured it
>> >> > was the module's fault but the EIP points to something else so I am
>> >> > not sure. I get the following repeating about 4 times on 2.6.32-rc5:
>> >>
>> >>
>> >> you can get this if your own code leaves interrupts disabled in a
>> >> kernel thread and then lets the cpu go idle...
>> >
>> > Unclear.
>> >
>> > acpi_enter_idle_bm() assumes that it is entered with irqs enabled,
>> > and so it we unconditionally disables IRQs.
>> >
>> > Then we unconditionally re-enable them.
>> >
>> > The problem seems to be that right after we enable them,
>> > we find that they are actually disabled, perhaps as
>> > a side-effect of SMM.
>> >
>> > Is your machine a Dell, per chance?
>>
>> Nope.
>>
>> > Please test the patches in this bug report:
>> > http://bugzilla.kernel.org/show_bug.cgi?id=14101
>>
>> In my case it was as Arjan pointed out and I've fixed it in my driver.
>> Sorry for not reporting back and thanks for your review.
> Hi, Luis
>   It is very great that this issue is fixed in your driver.
> But it seems that there exist so many similar issues on kerneloops.
>   >BUG: scheduling while atomic: swapper/0/0x10000100
>   >Call Trace:
>  [<ffffffff812d2efa>] ? acpi_idle_enter_bm+0x284/0x2bf
>  [<ffffffff813f931b>] ? cpuidle_idle_call+0x9b/0xf0
>  [<ffffffff81010e12>] ? cpu_idle+0xb2/0x100
>
>   >BUG: scheduling while atomic: swapper/0/0x10010000
>   >Call Trace:
>  [<ffffffff812d2efa>] ? acpi_idle_enter_bm+0x284/0x2bf
>  [<ffffffff813f931b>] ? cpuidle_idle_call+0x9b/0xf0
>  [<ffffffff81010e12>] ? cpu_idle+0xb2/0x100
>  [<ffffffff8151de43>] ? start_secondary+0xa9/0xab
>
> From the above log it seems that the preempt_count is 0x10010000,
> which means that this happens in softirq.

What's the preempt_count and how does it get changed?

> After the cpu is awoken from C-state, the interrupt is enabled.
> Then it can handle the interrupt ISR and soft IRQ if the interrupt is triggered.
> Is the above issue caused by that the might_sleep is called in the ISR/softIRQ?

Think so.

>   Can you describe how you fix this issue in your driver? It will be great if you can
> give us some example codes that can trigger this issue.

You can view the git commit here:

http://tinyurl.com/add-rx-support-ath9k-htc

Its a bit big but anything that has to do with mutex->spinlock is what fixed it.

Let me summarize what I did.

I took Arjan's tip for granted:

"you can get this if your own code leaves interrupts disabled in a
kernel thread and then lets the cpu go idle..."

So I went and checked code I might have which would do this. In my
case my USB irq handler was taking a nap with mutex lock somewhere
down the pipeline, once the workqueue has been kicked off and it grabs
the mutex_lock() and the ISR then wants to contend but sleeps.

I changed the ISR code to spin_lock_irqsave() while it pumps skbs into
an skb queue I had set up, and changed my workqueue which eats those
skbs on the skb queue to use spin_lock_bh() (this is also wrong so I
just changed it to irq_save as well).

FWIW the git tree is at:

git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/ath9k_htc.git

and the commit was 88f284ae6a6a7ed7404bcf52c1a5f0692b01ea7f

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html