On 5/10/21 11:45 PM, Axel Rasmussen wrote: > Thanks for the investigation, Mike! > > Mina, since hugetlb_mcopy_atomic_pte is specific to userfaultfd, I'm > happy to take a deeper look at it this week as well. > > For context, we have seen the WARN_ON Mina described trigger in > production before, but were never able to reproduce it. The > userfaultfd self test turns out to reproduce it reliably, so the > thinking up to this point was that it just happened to reproduce some > non-userfaultfd-specific issue. But from Mike's description, it seems > this bug is very specific to userfaultfd after all. :) Certainly, this case is userfaultfd specific. However, I too recall seeing 'transient' underflows in the past. Pretty sure this was not userfaultfd specific. Specifically, when working on commit 22146c3ce989 "hugetlbfs: dirty pages as they are added to pagecache" I recall seeing transient underflow. After fixing the issue in 22146c3ce989, I could not reproduce transient underflows and stopped looking for the cause. We added code to a production kernel in an attmempt to catch the issue: https://github.com/oracle/linux-uek/commit/bd697676290d91762ef2bf79832f653b44e6f83b#diff-fb6066ca63d9afdc2e3660c85e2dcc04cf31b37900ca9df2a1019ee8fa80dce0 I'll try running some non-userfaultfd specific tests to see if I can reproduce. -- Mike Kravetz