Rafael Aquini <aquini@xxxxxxxxxx> writes: > The bug here is quite simple: split_swap_cluster() misses checking for > lock_cluster() returning NULL before committing to change cluster_info->flags. I don't think so. We shouldn't run into this situation firstly. So the "fix" hides the real bug instead of fixing it. Just like we call VM_BUG_ON_PAGE(!PageLocked(head), head) in split_huge_page_to_list() instead of returning if !PageLocked(head) silently. > The fundamental problem has nothing to do with allocating, or not allocating > a swap cluster, but it has to do with the fact that the THP deferred split scan > can transiently race with swapcache insertion, and the fact that when you run > your swap area on rotational storage cluster_info is _always_ NULL. > split_swap_cluster() needs to check for lock_cluster() returning NULL because > that's one possible case, and it clearly fails to do so. If there's a race, we should fix the race. But the code path for swapcache insertion is, add_to_swap() get_swap_page() /* Return if fails to allocate */ add_to_swap_cache() SetPageSwapCache() While the code path to split THP is, split_huge_page_to_list() if PageSwapCache() split_swap_cluster() Both code paths are protected by the page lock. So there should be some other reasons to trigger the bug. And again, for HDD, a THP shouldn't have PageSwapCache() set at the first place. If so, the bug is that the flag is set and we should fix the setting. > Run a workload that cause multiple THP COW, and add a memory hogger to create > memory pressure so you'll force the reclaimers to kick the registered > shrinkers. The trigger is not heavy swapping, and that's probably why > most swap test cases don't hit it. The window is tight, but you will get the > NULL pointer dereference. Do you have a script to reproduce the bug? > Regardless you find furhter bugs, or not, this patch is needed to correct a > blunt coding mistake. As above. I don't agree with that. Best Regards, Huang, Ying