On 09/25/22 20:11, Peter Xu wrote: > On Sat, Sep 24, 2022 at 12:01:16PM -0700, Mike Kravetz wrote: > > On 09/24/22 11:06, Peter Xu wrote: > > > > > > Sorry I forgot to reply on this one. > > > > > > I didn't try linux-next, but I can easily reproduce this with mm-unstable > > > already, and I verified that Hugh's patch fixes the problem for shmem. > > > > > > When I was testing I found hugetlb selftest is broken too but with some > > > other errors: > > > > > > $ sudo ./userfaultfd hugetlb 100 10 > > > ... > > > bounces: 6, mode: racing ver read, ERROR: unexpected write fault (errno=0, line=779) > > > > > > The failing check was making sure all MISSING events are not triggered by > > > writes, but frankly I don't really know why it's required, and that check > > > existed since the 1st commit when test was introduced. > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c47174fc362a089b1125174258e53ef4a69ce6b8 > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/vm/userfaultfd.c?id=c47174fc362a089b1125174258e53ef4a69ce6b8#n291 > > > > > > And obviously some recent hugetlb-related change caused that to happen. > > > > > > Dropping that check can definitely work, but I'll have a closer look soon > > > too to make sure I didn't miss something. Mike, please also let me know if > > > you are aware of this problem. > > > > > > > Peter, I am not aware of this problem. I really should make running ALL > > hugetlb tests part of my regular routine. > > > > If you do not beat me to it, I will take a look in the next few days. > > Just to update - my bisection points to 00cdec99f3eb ("hugetlbfs: revert > use i_mmap_rwsem to address page fault/truncate race", 2022-09-21). > > I don't understand how they are related so far, though. It should be a > timing thing because the failure cannot be reproduced on a VM but only on > the host, and it can also pass sometimes even on the host but rarely. Thanks Peter! After your analysis, I also started looking at this. - I did reproduce a few times in a VM - On BM (a laptop) I could reproduce but it would take several (10's of) runs > Logically all the uffd messages in the stress test should be generated by > the locking thread, upon: > > pthread_mutex_lock(area_mutex(area_dst, page_nr)); I personally find that test program hard to understand/follow and it takes me a day or so to understand what it is doing, then I immediately loose context when I stop looking at it. :( So, as you mention below the program is depending on pthread_mutex_lock() doing a read fault before a write. > I thought a common scheme for lock() fast path should already be an > userspace cmpxchg() and that should be a write fault already. > > For example, I did some stupid hack on the test and I can trigger the write > check fault with anonymous easily with an explicit cmpxchg on byte offset 128: > > diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c > index 74babdbc02e5..a7d6938d4553 100644 > --- a/tools/testing/selftests/vm/userfaultfd.c > +++ b/tools/testing/selftests/vm/userfaultfd.c > @@ -637,6 +637,10 @@ static void *locking_thread(void *arg) > } else > page_nr += 1; > page_nr %= nr_pages; > + char *ptr = area_dst + (page_nr * page_size) + 128; > + char _old = 0, new = 1; > + (void)__atomic_compare_exchange_n(ptr, &_old, new, false, > + __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); > pthread_mutex_lock(area_mutex(area_dst, page_nr)); > count = *area_count(area_dst, page_nr); > if (count != count_verify[page_nr]) > > I'll need some more time thinking about it before I send a patch to drop > the write check.. I did another stupid hack, and duplicated the statement: count = *area_count(area_dst, page_nr); before the, pthread_mutex_lock(area_mutex(area_dst, page_nr)); This should guarantee a read fault independent of what pthread_mutex_lock does. However, it still results in the occasional "ERROR: unexpected write fault". So, something else if happening. I will continue to experiment and think about this. -- Mike Kravetz