Douglas Anderson <dianders@xxxxxxxxxxxx> writes: > The MIGRATE_SYNC_LIGHT mode is intended to block for things that will > finish quickly but not for things that will take a long time. Exactly > how long is too long is not well defined, but waits of tens of > milliseconds is likely non-ideal. > > Waiting on the folio lock in isolate_movable_page() is something that > usually is pretty quick, but is not officially bounded. Nothing stops > another process from holding a folio lock while doing an expensive > operation. Having an unbounded wait like this is not within the design > goals of MIGRATE_SYNC_LIGHT. > > When putting a Chromebook under memory pressure (opening over 90 tabs > on a 4GB machine) it was fairly easy to see delays waiting for the > lock of > 100 ms. While the laptop wasn't amazingly usable in this > state, it was still limping along and this state isn't something > artificial. Sometimes we simply end up with a lot of memory pressure. > > Putting the same Chromebook under memory pressure while it was running > Android apps (though not stressing them) showed a much worse result > (NOTE: this was on a older kernel but the codepaths here are > similar). Android apps on ChromeOS currently run from a 128K-block, > zlib-compressed, loopback-mounted squashfs disk. If we get a page > fault from something backed by the squashfs filesystem we could end up > holding a folio lock while reading enough from disk to decompress 128K > (and then decompressing it using the somewhat slow zlib algorithms). > That reading goes through the ext4 subsystem (because it's a loopback > mount) before eventually ending up in the block subsystem. This extra > jaunt adds extra overhead. Without much work I could see cases where > we ended up blocked on a folio lock for over a second. With more > more extreme memory pressure I could see up to 25 seconds. > > Let's bound the amount of time we can wait for the folio lock. The > SYNC_LIGHT migration mode can already handle failure for things that > are slow, so adding this timeout in is fairly straightforward. > > With this timeout, it can be seen that kcompactd can move on to more > productive tasks if it's taking a long time to acquire a lock. How long is the max wait time of folio_lock_timeout()? > NOTE: The reason I stated digging into this isn't because some > benchmark had gone awry, but because we've received in-the-field crash > reports where we have a hung task waiting on the page lock (which is > the equivalent code path on old kernels). While the root cause of > those crashes is likely unrelated and won't be fixed by this patch, > analyzing those crash reports did point out this unbounded wait and it > seemed like something good to fix. > > ALSO NOTE: the timeout mechanism used here uses "jiffies" and we also > will retry up to 7 times. That doesn't give us much accuracy in > specifying the timeout. On 1000 Hz machines we'll end up timing out in > 7-14 ms. On 100 Hz machines we'll end up in 70-140 ms. Given that we > don't have a strong definition of how long "too long" is, this is > probably OK. You can use HZ to work with different configuration. It doesn't help much if your target is 1ms. But I think that it's possible to set it to longer than that in the future. So, some general definition looks better. Best Regards, Huang, Ying > Suggested-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx> > --- > > Changes in v2: > - Keep unbounded delay in "SYNC", delay with a timeout in "SYNC_LIGHT" > > mm/migrate.c | 20 +++++++++++++++++++- > 1 file changed, 19 insertions(+), 1 deletion(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index db3f154446af..60982df71a93 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -58,6 +58,23 @@ > > #include "internal.h" > > +/* Returns the schedule timeout for a non-async mode */ > +static long timeout_for_mode(enum migrate_mode mode) > +{ > + /* > + * We'll always return 1 jiffy as the timeout. Since all places using > + * this timeout are in a retry loop this means that the maximum time > + * we might block is actually NR_MAX_MIGRATE_SYNC_RETRY jiffies. > + * If a jiffy is 1 ms that's 7 ms, though with the accuracy of the > + * timeouts it often ends up more like 14 ms; if a jiffy is 10 ms > + * that's 70-140 ms. > + */ > + if (mode == MIGRATE_SYNC_LIGHT) > + return 1; > + > + return MAX_SCHEDULE_TIMEOUT; > +} > + > bool isolate_movable_page(struct page *page, isolate_mode_t mode) > { > struct folio *folio = folio_get_nontail_page(page); > @@ -1162,7 +1179,8 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page > if (current->flags & PF_MEMALLOC) > goto out; > > - folio_lock(src); > + if (folio_lock_timeout(src, timeout_for_mode(mode))) > + goto out; > } > locked = true;