Re: kernel BUG at mm/huge_memory.c:2736(linux 5.10.29)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

> On Fri, Apr 23, 2021 at 1:07 AM Wang Yugui <wangyugui@xxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > > With this patch, the problem yet not happen after 4 tests(5.10.x).
> >
> > With this patch , another problem happened at 6th test.
> >
> > kernel BUG at mm/huge_memory.c:2343!
> > static void unmap_page(struct page *page)
> > {
> >     enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK |
> >         TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
> >     bool unmap_success;
> >
> >     VM_BUG_ON_PAGE(!PageHead(page), page);
> >
> >     if (PageAnon(page))
> >         ttu_flags |= TTU_SPLIT_FREEZE;
> >
> >     unmap_success = try_to_unmap(page, ttu_flags);
> > L2343:VM_BUG_ON_PAGE(!unmap_success,page);
> 
> Thanks for running the test. This is what I expected from the debug
> patch. It means try_to_unmap() didn't unmap the huge page
> successfully. The huge page is PTE-mapped, try_to_unmap() is supposed
> to unmap every mapped subpage. But it seems it didn't unmap any
> subpage at all (the refcount of the huge page is 512 per the log from
> earlier email).
> 
> By reading the code, I didn't figure out what went wrong yet. You
> mentioned that the 5.4.x kernel is fine, so may you try to do some
> bisect?

This maybe happen on some memory reclaim path.

Our application need to process the file about 300G-400G.

We have 4 servers, two servers have 192G memory, 1 server has 512G
memory, 1 server has 768G memory.

If the memory(total memory * 10 / 12 - 120G) is enough to process the
files, no temp file is needed. else, we will write the buffer to temp
file, and continue to process another part.

this problem happened on the server with 192G memory && kernel 5.10.x,
but yet not happen on the server with kernel 5.4.x  ||
total memory>=512G.

so this maybe a timing problem too. debug code maybe userful than code bisect?

fedora with new linux kernel configured with CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y,
so new linux kernel with CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y maybe not well
tested?

Best Regards
Wang Yugui (wangyugui@xxxxxxxxxxxx)
2021/04/24





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux