Re: Instability in current -git tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 13 Jul 2018 16:34:49 -0700 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Fri, Jul 13, 2018 at 4:13 PM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > It does seem to be related to low-memory situation. Maybe page-out.
> > I'm wondering if it's one of the fairly scary MM patches from this
> > merge window
> 
> Woo-hoo! Yes, I got it to happen in text-mode.
> 
>   kernel BUG at mm/page_alloc.c:2016
> 
> with the call chain being
> 
> RIP: move_pfreepages_block()
> Call Trace:
>   steal_suitable_fallback
>   get_page_from_freelist
>   __alloc_pages_nodemask
>   new_slab
>   ___slab_alloc
>   __slab_alloc
>   kmem_cache_alloc
>   __d_alloc
>   d_alloc
>   ...
> 
> (and then it goes down to sys_openat and path lookup).
> 
> I actually used the dcache stress-tester and a stupid "allocate memory
> and keep dirtying it" to get low on memory, and that d_alloc because
> of that.
> 
> And then when VM_BUG_ON() causes a do_exit(), you get a nested
> exception due to "sleeping function called from invalid context" in
> exit_)signals. And then the machine is well and truly dead and f*cked.
> 
> I hate BUG_ON() calls. I wonder how many weeks ago it was that I
> complained about people adding BUG_ON() calls last?
> 
> Anyway, looks like core VM buggery. Now, I don't know *which* one of
> the multiple tests in that VM_BUG_ON() triggered,

They all did:

	VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
	          pfn_valid(page_to_pfn(end_page)) &&
	          page_zone(start_page) != page_zone(end_page));

> and I have no idea
> which commit caused it, but at least non-VM people can probably
> breathe a sigh of release.,

> Andrew, I suspect it's some of yours. Adding Willy, because some of
> the scariest ones in the VM layer are from him (like thall those page
> member movement ones).
> 

Cc's added.  Pavel has been fiddling with this code lately.

The comment is interesting.

	/*
	 * page_zone is not safe to call in this context when
	 * CONFIG_HOLES_IN_ZONE is set. This bug check is probably redundant
	 * anyway as we check zone boundaries in move_freepages_block().
	 * Remove at a later date when no bug reports exist related to
	 * grouping pages by mobility
	 */

but we should work out why we're suddenly getting a range which crosses
zones before we just zap it.

(But it would be interesting to see whether removing the check "fixes" it)




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux