Re: [RFC PATCH] migrate_pages: Never block waiting for the page lock

"Huang, Ying" <ying.huang@xxxxxxxxx> · Mon, 17 Apr 2023 09:14:38 +0800

Doug Anderson <dianders@xxxxxxxxxxxx> writes:

> Hi,
>
> On Thu, Apr 13, 2023 at 8:10 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote:
>>
>> Douglas Anderson <dianders@xxxxxxxxxxxx> writes:
>>
>> > Currently when we try to do page migration and we're in "synchronous"
>> > mode (and not doing direct compaction) then we'll wait an infinite
>> > amount of time for a page lock. This does not appear to be a great
>> > idea.
>> >
>> > One issue can be seen when I put a device under extreme memory
>> > pressure. I took a sc7180-trogdor Chromebook (4GB RAM, 8GB zram
>> > swap). I ran the browser along with Android (which runs from a
>> > loopback mounted 128K block-size squashfs "disk"). I then manually ran
>> > the mmm_donut memory pressure tool [1]. The system is completely
>> > unusable both with and without this patch since there are 8 processes
>> > completely thrashing memory, but it was still interesting to look at
>> > how migration was behaving. I put some timing code in and I could see
>> > that we sometimes waited over 25 seconds (in the context of
>> > kcompactd0) for a page lock to become available. Although the 25
>> > seconds was the high mark, it was easy to see tens, hundreds, or
>> > thousands of milliseconds spent waiting on the lock.
>> >
>> > Instead of waiting, if I bailed out right away (as this patch does), I
>> > could see kcompactd0 move forward to successfully to migrate other
>> > pages instead. This seems like a better use of kcompactd's time.
>> >
>> > Thus, even though this didn't make the system any more usable in my
>> > absurd test case, it still seemed to make migration behave better and
>> > that feels like a win. It also makes the code simpler since we have
>> > one fewer special case.
>>
>> TBH, the test case is too extreme for me.
>
> That's fair. That being said, I guess the point I was trying to make
> is that waiting for this lock could take an unbounded amount of time.
> Other parts of the system sometimes hold a page lock and then do a
> blocking operation. At least in the case of kcompactd there are better
> uses of its time than waiting for any given page.
>
>> And, we have multiple "sync" mode to deal with latency requirement, for
>> example, we use MIGRATE_SYNC_LIGHT for compaction to avoid too long
>> latency.  If you have latency requirement for some users, you may
>> consider to add new "sync" mode.
>
> Sure. kcompactd_do_work() is currently using MIGRATE_SYNC_LIGHT. I
> guess my first thought would be to avoid adding a new mode and make
> MIGRATE_SYNC_LIGHT not block here. Then anyone that truly needs to
> wait for all the pages to be migrated can use the heavier sync modes.
> It seems to me like the current users of MIGRATE_SYNC_LIGHT would not
> want to block for an unbounded amount of time here. What do you think?

It appears that you can just use MIGRATE_ASYNC if you think the correct
behavior is "NOT block at all".  I found that there are more
fine-grained controls on this in compaction code, please take a look at
"enum compact_priority" and its comments.

Best Regards,
Huang, Ying