Douglas Anderson <dianders@xxxxxxxxxxxx> writes: > Currently when we try to do page migration and we're in "synchronous" > mode (and not doing direct compaction) then we'll wait an infinite > amount of time for a page lock. This does not appear to be a great > idea. > > One issue can be seen when I put a device under extreme memory > pressure. I took a sc7180-trogdor Chromebook (4GB RAM, 8GB zram > swap). I ran the browser along with Android (which runs from a > loopback mounted 128K block-size squashfs "disk"). I then manually ran > the mmm_donut memory pressure tool [1]. The system is completely > unusable both with and without this patch since there are 8 processes > completely thrashing memory, but it was still interesting to look at > how migration was behaving. I put some timing code in and I could see > that we sometimes waited over 25 seconds (in the context of > kcompactd0) for a page lock to become available. Although the 25 > seconds was the high mark, it was easy to see tens, hundreds, or > thousands of milliseconds spent waiting on the lock. > > Instead of waiting, if I bailed out right away (as this patch does), I > could see kcompactd0 move forward to successfully to migrate other > pages instead. This seems like a better use of kcompactd's time. > > Thus, even though this didn't make the system any more usable in my > absurd test case, it still seemed to make migration behave better and > that feels like a win. It also makes the code simpler since we have > one fewer special case. TBH, the test case is too extreme for me. And, we have multiple "sync" mode to deal with latency requirement, for example, we use MIGRATE_SYNC_LIGHT for compaction to avoid too long latency. If you have latency requirement for some users, you may consider to add new "sync" mode. Best Regards, Huang, Ying > The second issue that this patch attempts to fix is one that I haven't > managed to reproduce yet. We have crash reports from the field that > report that kcompactd0 was blocked for more than ~120 seconds on this > exact lock. These crash reports are on devices running older kernels > (5.10 mostly). In most of these crash reports the device is under > memory pressure and many tasks were blocked in squashfs code, ext4 > code, or memory allocation code. While I don't know if unblocking > kcompactd would actually have helped relieve the memory pressure, at > least there was a chance that it could have helped a little bit. > > [1] https://chromium.googlesource.com/chromiumos/platform/microbenchmarks/+/refs/heads/main/mmm_donut.py > > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx> > --- > > mm/migrate.c | 25 +++++++------------------ > 1 file changed, 7 insertions(+), 18 deletions(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index db3f154446af..dfb0a6944181 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1143,26 +1143,15 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page > dst->private = NULL; > > if (!folio_trylock(src)) { > - if (mode == MIGRATE_ASYNC) > - goto out; > - > /* > - * It's not safe for direct compaction to call lock_page. > - * For example, during page readahead pages are added locked > - * to the LRU. Later, when the IO completes the pages are > - * marked uptodate and unlocked. However, the queueing > - * could be merging multiple pages for one bio (e.g. > - * mpage_readahead). If an allocation happens for the > - * second or third page, the process can end up locking > - * the same page twice and deadlocking. Rather than > - * trying to be clever about what pages can be locked, > - * avoid the use of lock_page for direct compaction > - * altogether. > + * While there are some modes we could be running in where we > + * could block here waiting for the lock (specifically > + * modes other than MIGRATE_ASYNC and when we're not in > + * direct compaction), it's not worth the wait. Instead of > + * waiting, we'll bail. This will let the caller try to > + * migrate some other pages that aren't contended. > */ > - if (current->flags & PF_MEMALLOC) > - goto out; > - > - folio_lock(src); > + goto out; > } > locked = true;