On Tue, Apr 18, 2023 at 11:17:23AM +0200, Vlastimil Babka wrote: > > Actually, the more I think about it the more I think the right answer > > is to keep kcompactd as using MIGRATE_SYNC_LIGHT and make > > MIGRATE_SYNC_LIGHT not block on the folio lock. kcompactd can accept > > some blocking but we don't want long / unbounded blocking. Reading the > > comments for MIGRATE_SYNC_LIGHT, this also seems like it fits pretty > > well. MIGRATE_SYNC_LIGHT says that the stall time of writepage() is > > too much. It's entirely plausible that someone else holding the lock > > is doing something as slow as writepage() and thus waiting on the lock > > can be just as bad for latency. > > +Cc Mel for potential insights. Sounds like a good compromise at first > glance, but it's a tricky area. > Also there are other callers of migration than compaction, and we should > make sure we are not breaking them unexpectedly. > It's tricky because part of the point of SYNC_LIGHT was to block on some operations but not for too long. writepage was considered to be an exception because it can be very slow for a variety of reasons. I think At the time that writeback from reclaim context was possible and it was very inefficient, more more inefficient than reads. Storage can also be very slow (e.g. USB stick plugged in) and there can be large differences between read/write performance (SMR, some ssd etc). Pages read were generally clean and could be migrated, pages for write may be redirtied etc. It was assumed that while both read/write could lock the page for a long time, write had worse hold times and most other users of lock page were transient. A compromise for SYNC_LIGHT or even SYNC on lock page would be to try locking with a timeout. I don't think there is a specific helper but it should be possible to trylock, wait on the folio_waitqueue and attempt once to get the lock again. I didn't look very closely but it would doing something similar to folio_wait_bit_common() with io_schedule_timeout instead of io_schedule. This will have false positives because the folio waitqueue may be woken for unrelated pages and obviously it can race with other wait queues. kcompactd is an out-of-line wait and can afford to wait for a long time without user-visible impact but 120 seconds or any potentially unbounded length of time is ridiculous and unhelpful. I would still be wary about adding new sync modes or making major modifications to kcompactd because detecting application stalls due to a kcompactd modification is difficult. There is another approach -- kcompactd or proactive under heavy memory pressure is probably a waste of CPU time and resources and should avoid or minimise effort when under pressure. While direct compaction can capture a page for immediate use, kcompactd and proactive reclaim are just shuffling memory around for *potential* users and may be making the overall memory pressure even worse. If memory pressure detection was better and proactive/kcompactd reclaim bailed then the unbounded time to lock a page is mitigated or completely avoided. The two options are ortogonal. trylock_page_timeout() would mitigate worst case behaviour while memory pressure detection and bailing on SYNC* compaction side-steps the issue in extreme cases. Both have value as pressure detection would be best-effort and trylock_page_timeout() would limit worst-case behaviour. > > I'll try to send out a v2 with this approach today and we can see what > > people think. > > Please Cc Mel and myself for further versions. > Yes please. -- Mel Gorman SUSE Labs