On 01/12/23 15:17, Huang, Ying wrote: > Mike Kravetz <mike.kravetz@xxxxxxxxxx> writes: > > > On 01/12/23 08:09, Huang, Ying wrote: > >> Hi, Mike, > >> > >> Mike Kravetz <mike.kravetz@xxxxxxxxxx> writes: > >> > >> > On 01/10/23 17:53, Mike Kravetz wrote: > >> >> Just saw the following easily reproducible issue on next-20230110. Have not > >> >> verified it is related to/caused by this series, but it looks suspicious. > >> > > >> > Verified this is caused by the series, > >> > > >> > 734cbddcfe72 migrate_pages: organize stats with struct migrate_pages_stats > >> > to > >> > 323b933ba062 migrate_pages: batch flushing TLB > >> > > >> > in linux-next. > >> > >> Thanks for reporting. > >> > >> I tried this yesterday (next-20230111), but failed to reproduce it. Can > >> you share your kernel config? Is there any other setup needed? > > > > Config file is attached. > > Thanks! > > > Are you writing a REALLY big value to nr_hugepages? By REALLY big I > > mean a value that is impossible to fulfill. This will result in > > successful hugetlb allocations until __alloc_pages starts to fail. At > > this point we will be stressing compaction/migration trying to find more > > contiguous pages. > > > > Not sure if it matters, but I am running on a 2 node VM. The 2 nodes > > may be important as the hugetlb allocation code will try a little harder > > alternating between nodes that may perhaps stress compaction/migration > > more. > > Tried again on a 2-node machine. Still cannot reproduce it. > > >> BTW: can you bisect to one specific commit which causes the bug in the > >> series? > > > > I should have some time to isolate in the next day or so. Isolated to patch, [PATCH -v2 4/9] migrate_pages: split unmap_and_move() to _unmap() and _move() Actually, recreated/isolated by just applying this series to v6.2-rc3 in an effort to eliminate any possible noise in linux-next. Spent a little time looking at modifications made there, but nothing stood out. Will investigate more as time allows. -- Mike Kravetz