Re: Sleeping BUG in khugepaged for i586

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/06/2017 05:01 PM, Larry Finger wrote:
> On 06/06/2017 09:02 AM, Vlastimil Babka wrote:
>> On 06/05/2017 11:44 PM, Andrew Morton wrote:
>>> On Sat, 3 Jun 2017 14:24:26 -0500 Larry Finger <Larry.Finger@xxxxxxxxxxxx> wrote:
>>>
>>>> I recently turned on locking diagnostics for a Dell Latitude D600 laptop, which
>>>> requires a 32-bit kernel. In the log I found the following:
>>>>
>>>> BUG: sleeping function called from invalid context at mm/khugepaged.c:655
>>>> in_atomic(): 1, irqs_disabled(): 0, pid: 20, name: khugepaged
>>>> 1 lock held by khugepaged/20:
>>>>    #0:  (&mm->mmap_sem){++++++}, at: [<c03d6609>]
>>>> collapse_huge_page.isra.47+0x439/0x1240
>>>> CPU: 0 PID: 20 Comm: khugepaged Tainted: G        W
>>
>> W means thre was WARN earler. Could be related... Got logs?
> 
> When I grabbed a splat, I got the last one in my log. The first one shows "Not 
> tainted".
> 
>>
>>>> 4.12.0-rc1-wl-12125-g952a068 #80
>>
>> What is "wl-12125-g952a068"? What patches on top of mainline?
> 
> I found this while chasing a problem with one of the wireless drivers. For that 
> reason I use Kalle Valo's wireless-testing-next, which happens to be the only 
> kernel tree I have on this laptop. I'm reasonably certain that the extra updates 
> are not the cause of the problem as the first one appears before any of the 
> wireless drivers are loaded, but I will pull a clean copy of mainline to test 
> that assumption.
> 
>>>> Hardware name: Dell Computer Corporation Latitude D600
>>>> /03U652, BIOS A05 05/29/2003
>>>> Call Trace:
>>>>    dump_stack+0x76/0xb2
>>>>    ___might_sleep+0x174/0x230
>>>>    collapse_huge_page.isra.47+0xacf/0x1240
>>>>    khugepaged_scan_mm_slot+0x41e/0xc00
>>>>    ? _raw_spin_lock+0x46/0x50
>>>>    khugepaged+0x277/0x4f0
>>>>    ? prepare_to_wait_event+0xe0/0xe0
>>>>    kthread+0xeb/0x120
>>>>    ? khugepaged_scan_mm_slot+0xc00/0xc00
>>>>    ? kthread_create_on_node+0x30/0x30
>>>>    ret_from_fork+0x21/0x30
>>>>
>>>> I have no idea when this problem was introduced. Of course, I will test any
>>>> proposed fixes.
>>>>
>>>
>>> Odd.  There's nothing wrong with cond_resched() while holding mmap_sem.
>>> It looks like khugepaged forgot to do a spin_unlock somewhere and we
>>> leaked a preempt_count.
>>
>> Hmm I'd expect such spin lock to be reported together with mmap_sem in
>> the debugging "locks held" message?
> 
> My bisection of the problem is about half done. My latest good version is commit 
> 7b8cd33 and the latest bad one is 2ea659a. Only about 7 steps to go.

Hmm, your bisection will most likely just find commit 338a16ba15495
which added the cond_resched() at mm/khugepaged.c:655. CCing David who
added it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux