2020년 6월 19일 (금) 오전 10:33, Joonsoo Kim <js1304@xxxxxxxxx>님이 작성: > > On Wed, Jun 17, 2020 at 05:17:17AM -0700, Matthew Wilcox wrote: > > On Wed, Jun 17, 2020 at 02:26:21PM +0900, js1304@xxxxxxxxx wrote: > > > From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > > > > > > Swapcache doesn't handle the exceptional entries since there is no case > > > > Don't call them exceptional entries. > > > > The radix tree has/had the concecpt of exceptional entries. The swapcache > > doesn't use the radix tree any more, it uses the XArray. The XArray > > has value entries. > > > > But you shouldn't call them value entries either; that's an XArray > > concept. The swap cache and indeed page cache use value entries to > > implement shadow entries (they're also used to implement dax entries and > > swap entries in the page cache). So just call them shadow entries here. > > > > I know there are still places which use the term 'nrexceptional' in > > the kernel. I just haven't got round to replacing them yet. Please > > don't add more. > > Okay! Thanks for commenting. > > > > > > +void clear_shadow_from_swap_cache(int type, unsigned long begin, > > > + unsigned long end) > > > +{ > > > + unsigned long curr; > > > + void *old; > > > + swp_entry_t entry = swp_entry(type, begin); > > > + struct address_space *address_space = swap_address_space(entry); > > > + XA_STATE(xas, &address_space->i_pages, begin); > > > + > > > +retry: > > > + xa_lock_irq(&address_space->i_pages); > > > + for (curr = begin; curr <= end; curr++) { > > > + entry = swp_entry(type, curr); > > > + if (swap_address_space(entry) != address_space) { > > > + xa_unlock_irq(&address_space->i_pages); > > > + address_space = swap_address_space(entry); > > > + begin = curr; > > > + xas_set(&xas, begin); > > > + goto retry; > > > + } > > > + > > > + old = xas_load(&xas); > > > + if (!xa_is_value(old)) > > > + continue; > > > + xas_store(&xas, NULL); > > > + address_space->nrexceptional--; > > > + xas_next(&xas); > > > + } > > > + xa_unlock_irq(&address_space->i_pages); > > > +} > > > > This is a very clunky loop. I'm not sure it's even right, given that > > you change address space without changing the xas's address space. How > > about this? > > You are correct. The xas's address space should be changed. > > > > for (;;) { > > XA_STATE(xas, &address_space->i_pages, begin); > > unsigned long nr_shadows = 0; > > > > xas_lock_irq(&xas); > > xas_for_each(&xas, entry, end) { > > if (!xa_is_value(entry)) > > continue; > > xas_store(&xas, NULL); > > nr_shadows++; > > } > > address_space->nr_exceptionals -= nr_shadows; > > xas_unlock_irq(&xas); > > > > if (xas.xa_index >= end) > > break; > > entry = swp_entry(type, xas.xa_index); > > address_space = swap_address_space(entry); > > } > > Thanks for suggestion. > > I make a patch based on your suggestion. IIUC about Xarray, > after running xas_for_each(), xas.xa_index can be less than the end if > there are empty slots on last portion of array. Handling this case is > also considered in my patch. Hello, Matthew. Could you check if the following patch (Xarray part) is correct? Since I made a patch based on your suggestion, I'd like to get your review. :) Thanks. > Thanks. > > ------------------->8-------------------------------- > From 72e97600ea294372a13ab8e208ebd3c0e1889408 Mon Sep 17 00:00:00 2001 > From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > Date: Fri, 15 Nov 2019 09:48:32 +0900 > Subject: [PATCH v6 4/6] mm/swapcache: support to handle the shadow entries > > Workingset detection for anonymous page will be implemented in the > following patch and it requires to store the shadow entries into the > swapcache. This patch implements an infrastructure to store the shadow > entry in the swapcache. > > Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > --- > include/linux/swap.h | 17 ++++++++++++---- > mm/shmem.c | 3 ++- > mm/swap_state.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++------ > mm/swapfile.c | 2 ++ > mm/vmscan.c | 2 +- > 5 files changed, 69 insertions(+), 12 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index f4f5f94..901da54 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -416,9 +416,13 @@ extern struct address_space *swapper_spaces[]; > extern unsigned long total_swapcache_pages(void); > extern void show_swap_cache_info(void); > extern int add_to_swap(struct page *page); > -extern int add_to_swap_cache(struct page *, swp_entry_t, gfp_t); > -extern void __delete_from_swap_cache(struct page *, swp_entry_t entry); > +extern int add_to_swap_cache(struct page *page, swp_entry_t entry, > + gfp_t gfp, void **shadowp); > +extern void __delete_from_swap_cache(struct page *page, > + swp_entry_t entry, void *shadow); > extern void delete_from_swap_cache(struct page *); > +extern void clear_shadow_from_swap_cache(int type, unsigned long begin, > + unsigned long end); > extern void free_page_and_swap_cache(struct page *); > extern void free_pages_and_swap_cache(struct page **, int); > extern struct page *lookup_swap_cache(swp_entry_t entry, > @@ -572,13 +576,13 @@ static inline int add_to_swap(struct page *page) > } > > static inline int add_to_swap_cache(struct page *page, swp_entry_t entry, > - gfp_t gfp_mask) > + gfp_t gfp_mask, void **shadowp) > { > return -1; > } > > static inline void __delete_from_swap_cache(struct page *page, > - swp_entry_t entry) > + swp_entry_t entry, void *shadow) > { > } > > @@ -586,6 +590,11 @@ static inline void delete_from_swap_cache(struct page *page) > { > } > > +static inline void clear_shadow_from_swap_cache(int type, unsigned long begin, > + unsigned long end) > +{ > +} > + > static inline int page_swapcount(struct page *page) > { > return 0; > diff --git a/mm/shmem.c b/mm/shmem.c > index a0dbe62..e9a99a2 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1374,7 +1374,8 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) > list_add(&info->swaplist, &shmem_swaplist); > > if (add_to_swap_cache(page, swap, > - __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN) == 0) { > + __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN, > + NULL) == 0) { > spin_lock_irq(&info->lock); > shmem_recalc_inode(inode); > info->swapped++; > diff --git a/mm/swap_state.c b/mm/swap_state.c > index 1050fde..49a66dc 100644 > --- a/mm/swap_state.c > +++ b/mm/swap_state.c > @@ -110,12 +110,14 @@ void show_swap_cache_info(void) > * add_to_swap_cache resembles add_to_page_cache_locked on swapper_space, > * but sets SwapCache flag and private instead of mapping and index. > */ > -int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp) > +int add_to_swap_cache(struct page *page, swp_entry_t entry, > + gfp_t gfp, void **shadowp) > { > struct address_space *address_space = swap_address_space(entry); > pgoff_t idx = swp_offset(entry); > XA_STATE_ORDER(xas, &address_space->i_pages, idx, compound_order(page)); > unsigned long i, nr = hpage_nr_pages(page); > + void *old; > > VM_BUG_ON_PAGE(!PageLocked(page), page); > VM_BUG_ON_PAGE(PageSwapCache(page), page); > @@ -125,16 +127,25 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp) > SetPageSwapCache(page); > > do { > + unsigned long nr_shadows = 0; > + > xas_lock_irq(&xas); > xas_create_range(&xas); > if (xas_error(&xas)) > goto unlock; > for (i = 0; i < nr; i++) { > VM_BUG_ON_PAGE(xas.xa_index != idx + i, page); > + old = xas_load(&xas); > + if (xa_is_value(old)) { > + nr_shadows++; > + if (shadowp) > + *shadowp = old; > + } > set_page_private(page + i, entry.val + i); > xas_store(&xas, page); > xas_next(&xas); > } > + address_space->nrexceptional -= nr_shadows; > address_space->nrpages += nr; > __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr); > ADD_CACHE_INFO(add_total, nr); > @@ -154,7 +165,8 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp) > * This must be called only on pages that have > * been verified to be in the swap cache. > */ > -void __delete_from_swap_cache(struct page *page, swp_entry_t entry) > +void __delete_from_swap_cache(struct page *page, > + swp_entry_t entry, void *shadow) > { > struct address_space *address_space = swap_address_space(entry); > int i, nr = hpage_nr_pages(page); > @@ -166,12 +178,14 @@ void __delete_from_swap_cache(struct page *page, swp_entry_t entry) > VM_BUG_ON_PAGE(PageWriteback(page), page); > > for (i = 0; i < nr; i++) { > - void *entry = xas_store(&xas, NULL); > + void *entry = xas_store(&xas, shadow); > VM_BUG_ON_PAGE(entry != page, entry); > set_page_private(page + i, 0); > xas_next(&xas); > } > ClearPageSwapCache(page); > + if (shadow) > + address_space->nrexceptional += nr; > address_space->nrpages -= nr; > __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr); > ADD_CACHE_INFO(del_total, nr); > @@ -208,7 +222,7 @@ int add_to_swap(struct page *page) > * Add it to the swap cache. > */ > err = add_to_swap_cache(page, entry, > - __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN); > + __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, NULL); > if (err) > /* > * add_to_swap_cache() doesn't return -EEXIST, so we can safely > @@ -246,13 +260,44 @@ void delete_from_swap_cache(struct page *page) > struct address_space *address_space = swap_address_space(entry); > > xa_lock_irq(&address_space->i_pages); > - __delete_from_swap_cache(page, entry); > + __delete_from_swap_cache(page, entry, NULL); > xa_unlock_irq(&address_space->i_pages); > > put_swap_page(page, entry); > page_ref_sub(page, hpage_nr_pages(page)); > } > > +void clear_shadow_from_swap_cache(int type, unsigned long begin, > + unsigned long end) > +{ > + unsigned long curr = begin; > + void *old; > + > + for (;;) { > + unsigned long nr_shadows = 0; > + swp_entry_t entry = swp_entry(type, curr); > + struct address_space *address_space = swap_address_space(entry); > + XA_STATE(xas, &address_space->i_pages, curr); > + > + xa_lock_irq(&address_space->i_pages); > + xas_for_each(&xas, old, end) { > + if (!xa_is_value(old)) > + continue; > + xas_store(&xas, NULL); > + nr_shadows++; > + } > + address_space->nrexceptional -= nr_shadows; > + xa_unlock_irq(&address_space->i_pages); > + > + /* search the next swapcache until we meet end */ > + curr >>= SWAP_ADDRESS_SPACE_SHIFT; > + curr++; > + curr <<= SWAP_ADDRESS_SPACE_SHIFT; > + if (curr > end) > + break; > + } > +} > + > /* > * If we are the only user, then try to free up the swap cache. > * > @@ -429,7 +474,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, > __SetPageSwapBacked(page); > > /* May fail (-ENOMEM) if XArray node allocation failed. */ > - if (add_to_swap_cache(page, entry, gfp_mask & GFP_KERNEL)) { > + if (add_to_swap_cache(page, entry, gfp_mask & GFP_KERNEL, NULL)) { > put_swap_page(page, entry); > goto fail_unlock; > } > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 38f6433..4630db1 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -696,6 +696,7 @@ static void add_to_avail_list(struct swap_info_struct *p) > static void swap_range_free(struct swap_info_struct *si, unsigned long offset, > unsigned int nr_entries) > { > + unsigned long begin = offset; > unsigned long end = offset + nr_entries - 1; > void (*swap_slot_free_notify)(struct block_device *, unsigned long); > > @@ -721,6 +722,7 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, > swap_slot_free_notify(si->bdev, offset); > offset++; > } > + clear_shadow_from_swap_cache(si->type, begin, end); > } > > static void set_cluster_next(struct swap_info_struct *si, unsigned long next) > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 3caa35f..37943bf 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -901,7 +901,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page, > if (PageSwapCache(page)) { > swp_entry_t swap = { .val = page_private(page) }; > mem_cgroup_swapout(page, swap); > - __delete_from_swap_cache(page, swap); > + __delete_from_swap_cache(page, swap, NULL); > xa_unlock_irqrestore(&mapping->i_pages, flags); > put_swap_page(page, swap); > workingset_eviction(page, target_memcg); > -- > 2.7.4 >