Andrea Righi <andrea.righi@xxxxxxxxxxxxx> writes: > On Mon, Apr 13, 2020 at 09:00:34PM +0800, Huang, Ying wrote: >> Andrea Righi <andrea.righi@xxxxxxxxxxxxx> writes: >> >> [snip] >> >> > diff --git a/mm/swap_state.c b/mm/swap_state.c >> > index ebed37bbf7a3..c71abc8df304 100644 >> > --- a/mm/swap_state.c >> > +++ b/mm/swap_state.c >> > @@ -20,6 +20,7 @@ >> > #include <linux/migrate.h> >> > #include <linux/vmalloc.h> >> > #include <linux/swap_slots.h> >> > +#include <linux/oom.h> >> > #include <linux/huge_mm.h> >> > >> > #include <asm/pgtable.h> >> > @@ -507,6 +508,14 @@ static unsigned long swapin_nr_pages(unsigned long offset) >> > max_pages = 1 << READ_ONCE(page_cluster); >> > if (max_pages <= 1) >> > return 1; >> > + /* >> > + * If current task is using too much memory or swapoff is running >> > + * simply use the max readahead size. Since we likely want to load a >> > + * lot of pages back into memory, using a fixed-size max readhaead can >> > + * give better performance in this case. >> > + */ >> > + if (oom_task_origin(current)) >> > + return max_pages; >> > >> > hits = atomic_xchg(&swapin_readahead_hits, 0); >> > pages = __swapin_nr_pages(prev_offset, offset, hits, max_pages, >> >> Thinks this again. If my understanding were correct, the accessing >> pattern during swapoff is sequential, why swap readahead doesn't work? >> If so, can you root cause that firstly? > > Theoretically if the pattern is sequential the current heuristic should > already select a big readahead size, but apparently it's not doing that. > > I'll repeat my tests tracing the readahead size during swapoff to see > exactly what's going on here. I haven't verify it. It may be helpful to call lookup_swap_cache() before swapin_readahead() in unuse_pte_range(). The theory behind it is to update the swap readahead statistics via lookup_swap_cache(). Best Regards, Huang, Ying