Kairui Song <ryncsn@xxxxxxxxx> writes: > Huang, Ying <ying.huang@xxxxxxxxx> 于2023年11月20日周一 14:07写道: >> >> Kairui Song <ryncsn@xxxxxxxxx> writes: >> >> > From: Kairui Song <kasong@xxxxxxxxxxx> >> > >> > Currently VMA readahead is globally disabled when any rotate disk is >> > used as swap backend. So multiple swap devices are enabled, if a slower >> > hard disk is set as a low priority fallback, and a high performance SSD >> > is used and high priority swap device, vma readahead is disabled globally. >> > The SSD swap device performance will drop by a lot. >> > >> > Check readahead policy per entry to avoid such problem. >> > >> > Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx> >> > --- >> > mm/swap_state.c | 12 +++++++----- >> > 1 file changed, 7 insertions(+), 5 deletions(-) >> > >> > diff --git a/mm/swap_state.c b/mm/swap_state.c >> > index ff6756f2e8e4..fb78f7f18ed7 100644 >> > --- a/mm/swap_state.c >> > +++ b/mm/swap_state.c >> > @@ -321,9 +321,9 @@ static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_entry_ >> > return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1; >> > } >> > >> > -static inline bool swap_use_vma_readahead(void) >> > +static inline bool swap_use_vma_readahead(struct swap_info_struct *si) >> > { >> > - return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap); >> > + return data_race(si->flags & SWP_SOLIDSTATE) && READ_ONCE(enable_vma_readahead); >> > } >> > >> > /* >> > @@ -341,7 +341,7 @@ struct folio *swap_cache_get_folio(swp_entry_t entry, >> > >> > folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry)); >> > if (!IS_ERR(folio)) { >> > - bool vma_ra = swap_use_vma_readahead(); >> > + bool vma_ra = swap_use_vma_readahead(swp_swap_info(entry)); >> > bool readahead; >> > >> > /* >> > @@ -920,16 +920,18 @@ static struct page *swapin_no_readahead(swp_entry_t entry, gfp_t gfp_mask, >> > struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, >> > struct vm_fault *vmf, bool *swapcached) >> > { >> > + struct swap_info_struct *si; >> > struct mempolicy *mpol; >> > struct page *page; >> > pgoff_t ilx; >> > bool cached; >> > >> > + si = swp_swap_info(entry); >> > mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx); >> > - if (swap_use_no_readahead(swp_swap_info(entry), entry)) { >> > + if (swap_use_no_readahead(si, entry)) { >> > page = swapin_no_readahead(entry, gfp_mask, mpol, ilx, vmf->vma->vm_mm); >> > cached = false; >> > - } else if (swap_use_vma_readahead()) { >> > + } else if (swap_use_vma_readahead(si)) { >> >> It's possible that some pages are swapped out to SSD while others are >> swapped out to HDD in a readahead window. >> >> I suspect that there are practical requirements to use swap on SSD and >> HDD at the same time. > > Hi Ying, > > Thanks for the review! > > For the first issue "fragmented readahead window", I was planning to > do an extra check in readahead path to skip readahead entries that are > on different swap devices, which is not hard to do, This is a possible solution. > but this series is growing too long so I thought it will be better > done later. You don't need to keep everything in one series. Just use multiple series. Even if they are all swap-related. They are dealing with different problem in fact. > For the second issue, "is there any practical use for multiple swap", > I think actually there are. For example we are trying to use multi > layer swap for offloading memory of different hotness on servers. And > we also tried to implement a mechanism to migrate long sleep swap > entries from high performance SSD/RAMDISK swap to cheap HDD swap > device, with more than two layers of swap, which worked except the > upstream issue, that readahead policy will no longer work as expected. Thanks for your information. >> > page = swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf); >> > cached = true; >> > } else { -- Best Regards, Huang, Ying