On 2024/3/19 19:09, Barry Song wrote: > On Tue, Mar 19, 2024 at 4:56 PM Michal Hocko <mhocko@xxxxxxxx> wrote: >> >> On Fri 15-03-24 16:18:03, liuhailong@xxxxxxxx wrote: >>> From: "Hailong.Liu" <liuhailong@xxxxxxxx> >>> >>> This reverts >>> commit b7108d66318a ("Multi-gen LRU: skip CMA pages when they are not eligible") >>> commit 5da226dbfce3 ("mm: skip CMA pages when they are not available") >>> >>> skip_cma may cause system not responding. if cma pages is large in lru_list >>> and system is in lowmemory, many tasks would direct reclaim and waste >>> cpu time to isolate_lru_pages and return. >>> >>> Test this patch on android-5.15 8G device >>> reproducer: >>> - cma_declare_contiguous 3G pages >>> - set /proc/sys/vm/swappiness 0 to enable direct_reclaim reclaim file >>> only. >>> - run a memleak process in userspace >> >> Does this represent a sane configuration? CMA memory is unusable for >> kernel allocations and memleak process is also hard to reclaim due to >> swap suppression. Isn't such a system doomed to struggle to reclaim any >> memory? Yes, All processes in the system are also hard to reclaim. and all processes enter direct reclaim. with this patch, much of process which should skip_cma would retry, scan, skipped in the process of isolsate_lru_pages. and system process will have high priority, some normal processes (like kswapd) are preempted. Btw. how does the same setup behave with the regular LRU >> implementation? My guess would be that it would struggle as well. > > I assume the regular LRU implementation you are talking about is the LRU > without skip_cma()? > > I remember Hailong mentioned something like " it also trigger memory psi > event to allow admin do something to release memory" and " without > patch the devices would kill camera process". So it seems the difference > is if a killing will occur. > > Hailong, would you like to provide more detail? psi_event triggered after psi_memstall_leave. much system processes perform_reclaim scan and skipped and leave without reclaim any pages. the process is fast, so lmkd (userspace lowmemory killer) could not work as before. > >> -- >> Michal Hocko >> SUSE Labs >>