On Mon 09-10-23 13:58:10, Huang, Ying wrote: > Jianlin Lv <iecedge@xxxxxxxxx> writes: > > > On Sun, Oct 8, 2023 at 4:26 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > >> > >> Jianlin Lv <iecedge@xxxxxxxxx> writes: > >> > >> > On Sun, Oct 8, 2023 at 9:17 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > >> >> > >> >> Jianlin Lv <iecedge@xxxxxxxxx> writes: > >> >> > >> >> > From: Jianlin Lv <iecedge@xxxxxxxxx> > >> >> > > >> >> > Global reclaim will swap even if swappiness is set to 0. > >> >> > >> >> Why? Can you elaborate the situation? > >> > > >> > We reproduced the issue of pages being swapped out even when swappiness is > >> > set to 0 in the production environment through the following test program. > >> > Not sure whether this program can reproduce the issue in any environment. > >> > > >> > From the implementation of the get_scan_count code, it can be seen that, > >> > based on the current runtime situation, memory reclamation will choose a > >> > scanning method (SCAN_ANON/SCAN_FILE/SCAN_FRACT) to determine how > >> > aggressively the anon and file LRU are scanned. However, this introduces > >> > uncertainty. > >> > > >> > For the JVM issue at hand, we expect deterministic SCAN_FILE scan to avoid > >> > swapping out anon pages. > >> > >> Why doesn't memory.swap.max work? > > > > The main reason is that deployed nodes are kept on cgroups v1. Please note that cgroups v1 is in the maintenance mode with no new functionality to be added. What is the reason you are sticking with v1? > Check the code again. IIUC, for swappiness == 0, anonymous pages will > only be reclaimed if sc->file_is_tiny is true. For the memcg reclaim (i.e. not the global one) we try to avoid swapping even when file_is_tiny IIRC. > If we don't swap in that > situation, OOM may be triggerred. I don't think that it's a good idea > to do that. Or I miss something? Or even worse the system might start trashing heavily over that remaining tiny page cache. -- Michal Hocko SUSE Labs