Hi, > On Fri, Apr 23, 2021 at 1:07 AM Wang Yugui <wangyugui@xxxxxxxxxxxx> wrote: > > > > Hi, > > > > > With this patch, the problem yet not happen after 4 tests(5.10.x). > > > > With this patch , another problem happened at 6th test. > > > > kernel BUG at mm/huge_memory.c:2343! > > static void unmap_page(struct page *page) > > { > > enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | > > TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; > > bool unmap_success; > > > > VM_BUG_ON_PAGE(!PageHead(page), page); > > > > if (PageAnon(page)) > > ttu_flags |= TTU_SPLIT_FREEZE; > > > > unmap_success = try_to_unmap(page, ttu_flags); > > L2343:VM_BUG_ON_PAGE(!unmap_success,page); > > Thanks for running the test. This is what I expected from the debug > patch. It means try_to_unmap() didn't unmap the huge page > successfully. The huge page is PTE-mapped, try_to_unmap() is supposed > to unmap every mapped subpage. But it seems it didn't unmap any > subpage at all (the refcount of the huge page is 512 per the log from > earlier email). > > By reading the code, I didn't figure out what went wrong yet. You > mentioned that the 5.4.x kernel is fine, so may you try to do some > bisect? This maybe happen on some memory reclaim path. Our application need to process the file about 300G-400G. We have 4 servers, two servers have 192G memory, 1 server has 512G memory, 1 server has 768G memory. If the memory(total memory * 10 / 12 - 120G) is enough to process the files, no temp file is needed. else, we will write the buffer to temp file, and continue to process another part. this problem happened on the server with 192G memory && kernel 5.10.x, but yet not happen on the server with kernel 5.4.x || total memory>=512G. so this maybe a timing problem too. debug code maybe userful than code bisect? fedora with new linux kernel configured with CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y, so new linux kernel with CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y maybe not well tested? Best Regards Wang Yugui (wangyugui@xxxxxxxxxxxx) 2021/04/24