> On Jul 12, 2019, at 3:12 PM, Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote: > > > > On 7/11/19 2:07 PM, Qian Cai wrote: >> On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote: >>> Hi Qian, >>> >>> >>> Thanks for reporting the issue. But, I can't reproduce it on my machine. >>> Could you please share more details about your test? How often did you >>> run into this problem? >> I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here >> is some more information. >> >> # cat .config >> >> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > I tried your kernel config, but I still can't reproduce it. My compiler doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my test, but I don't think this would make any difference for this case. > > According to the bug call trace in the earlier email, it looks deferred _split_scan lost race with put_compound_page. The put_compound_page would call free_transhuge_page() which delete the page from the deferred split queue, but it may still appear on the deferred list due to some reason. > > Would you please try the below patch? > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index b7f709d..66bd9db 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) > if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { > if (!list_empty(page_deferred_list(head))) { > ds_queue->split_queue_len--; > - list_del(page_deferred_list(head)); > + list_del_init(page_deferred_list(head)); > } > if (mapping) > __dec_node_page_state(page, NR_SHMEM_THPS); > @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page) > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > if (!list_empty(page_deferred_list(page))) { > ds_queue->split_queue_len--; > - list_del(page_deferred_list(page)); > + list_del_init(page_deferred_list(page)); > } > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > free_compound_page(page); Unfortunately, I am no longer be able to reproduce the original list corruption with today’s linux-next.