On Mon, Feb 21, 2022 at 12:55 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Sat 19-02-22 09:49:40, Suren Baghdasaryan wrote: > > When page allocation in direct reclaim path fails, the system will > > make one attempt to shrink per-cpu page lists and free pages from > > high alloc reserves. Draining per-cpu pages into buddy allocator can > > be a very slow operation because it's done using workqueues and the > > task in direct reclaim waits for all of them to finish before > > proceeding. Currently this time is not accounted as psi memory stall. > > > > While testing mobile devices under extreme memory pressure, when > > allocations are failing during direct reclaim, we notices that psi > > events which would be expected in such conditions were not triggered. > > After profiling these cases it was determined that the reason for > > missing psi events was that a big chunk of time spent in direct > > reclaim is not accounted as memory stall, therefore psi would not > > reach the levels at which an event is generated. Further investigation > > revealed that the bulk of that unaccounted time was spent inside > > drain_all_pages call. > > It would be cool to have some numbers here. A typical case I was able to record when drain_all_pages path gets activated: __alloc_pages_slowpath took 44.644.613ns __perform_reclaim 751.668ns (1.7%) drain_all_pages took 43.887.167ns (98.3%) PSI in this case records the time spent in __perform_reclaim but ignores drain_all_pages, IOW it misses 98.3% of the time spent in __alloc_pages_slowpath. Sure, normally it's not often that this path is activated, but when it is, we miss reporting most of the stall. > > > Annotate drain_all_pages and unreserve_highatomic_pageblock during > > page allocation failure in the direct reclaim path so that delays > > caused by these calls are accounted as memory stall. > > If the draining is too slow and dependent on the current CPU/WQ > contention then we should address that. The original intention was that > having a dedicated WQ with WQ_MEM_RECLAIM would help to isolate the > operation from the rest of WQ activity. Maybe we need to fine tune > mm_percpu_wq. If that doesn't help then we should revise the WQ model > and use something else. Memory reclaim shouldn't really get stuck behind > other unrelated work. Agree. However even after improving this I think we should record the time spent in drain_all_pages as psi memstall. So, this patch I believe is still relevant. Thanks, Suren. > -- > Michal Hocko > SUSE Labs