Hi Zhengjun On Thu, 22 Apr 2021 16:36:19 +0800 Zhengjun Xing wrote: > In the system with very few file pages (nr_active_file + > nr_inactive_file < 100), it is easy to reproduce "nr_isolated_file > > nr_inactive_file", then too_many_isolated return true, > shrink_inactive_list enter "msleep(100)", the long latency will happen. We should skip reclaiming page cache in this case. > > The test case to reproduce it is very simple: allocate many huge > pages(near the DRAM size), then do free, repeat the same operation many > times. > In the test case, the system with very few file pages (nr_active_file + > nr_inactive_file < 100), I have dumpped the numbers of > active/inactive/isolated file pages during the whole test(see in the > attachments) , in shrink_inactive_list "too_many_isolated" is very easy > to return true, then enter "msleep(100)",in "too_many_isolated" > sc->gfp_mask is 0x342cca ("_GFP_IO" and "__GFP_FS" is masked) , it is > also very easy to enter “inactive >>=3”, then “isolated > inactive” will > be true. > > So I have a proposal to set a threshold number for the total file pages > to ignore the system with very few file pages, and then bypass the 100ms > sleep. > It is hard to set a perfect number for the threshold, so I just give an > example of "256" for it. Another option seems like we take a nap at the second time of lru tmi with some allocators in your case served without the 100ms delay. +++ x/mm/vmscan.c @@ -118,6 +118,9 @@ struct scan_control { /* The file pages on the current node are dangerously low */ unsigned int file_is_tiny:1; + unsigned int file_tmi:1; /* too many isolated */ + unsigned int anon_tmi:1; + /* Allocation order */ s8 order; @@ -1905,6 +1908,21 @@ static int current_may_throttle(void) bdi_write_congested(current->backing_dev_info); } +static void update_sc_tmi(struct scan_control *sc, bool file, int set) +{ + if (file) + sc->file_tmi = set; + else + sc->anon_tmi = set; +} +static bool is_sc_tmi(struct scan_control *sc, bool file) +{ + if (file) + return sc->file_tmi != 0; + else + return sc->anon_tmi != 0; +} + /* * shrink_inactive_list() is a helper for shrink_node(). It returns the number * of reclaimed pages @@ -1927,6 +1945,11 @@ shrink_inactive_list(unsigned long nr_to if (stalled) return 0; + if (!is_sc_tmi(sc, file)) { + update_sc_tmi(sc, file, 1); + return 0; + } + /* wait a bit for the reclaimer. */ msleep(100); stalled = true; @@ -1936,6 +1959,9 @@ shrink_inactive_list(unsigned long nr_to return SWAP_CLUSTER_MAX; } + if (is_sc_tmi(sc, file)) + update_sc_tmi(sc, file, 0); + lru_add_drain(); spin_lock_irq(&lruvec->lru_lock);