On Fri, Nov 18, 2016 at 01:26:59PM -0800, Hugh Dickins wrote: > On Fri, 18 Nov 2016, Minchan Kim wrote: > > On Thu, Nov 17, 2016 at 08:35:10PM -0800, Hugh Dickins wrote: > > > > > > Maybe add SWP_STABLE_WRITES in include/linux/swap.h, and set that > > > in swap_info->flags according to bdi_cap_stable_pages_required(), > > > leaving mapping->host itself NULL as before? > > > > The problem with the approach is that we need to get swap_info_struct > > in reuse_swap_page so maybe, every caller should pass swp_entry_t > > into reuse_swap_page. It would be no problem if swap slot is really > > referenced the page(IOW, pte is real swp_entry_t) but some cases > > where swap slot is already empty but the page remains in only > > swap cache, we cannot pass swp_entry_t which means that we cannot > > get swap_info_struct. > > I don't see the problem: if the page is PageSwapCache (and page > lock is held), then the swp_entry_t is there in page->private: > see page_swapcount(), which reuse_swap_page() just called. Right you are!! > > > > > So, if I didn't miss, another option I can imagine is to move > > SWP_STABLE_WRITES to address_space->flags as AS_STABLE_WRITES. > > With that, we can always get the information without passing > > swp_entry_t. Is there any better idea? > > I think what you suggest below would work fine (and be quicker > than looking up the swap_info again): but is horribly misleading > for anyone else interested in stable writes, for whom the info is > kept in the bdi, and not in this new mapping flag. > > So I'd much prefer that you keep the swap special case inside the > world of swap, with a SWP_STABLE_WRITES flag. Maybe unfold > page_swapcount() inside reuse_swap_page(), so that it only > needs a single lookup (or perhaps I'm optimizing prematurely). > I toally agree. Here is new one. >From 90b43e41a9fd8091e39246add583886c23360cdd Mon Sep 17 00:00:00 2001 From: Minchan Kim <minchan@xxxxxxxxxx> Date: Fri, 11 Nov 2016 15:02:57 +0900 Subject: [PATCH v2] mm: support anonymous stable page For developemnt for zram-swap asynchronous writeback, I found strange corruption of compressed page. With investigation, it reveals currently stable page doesn't support anonymous page. IOW, reuse_swap_page can reuse the page without waiting writeback completion so that it can corrupt data during zram compression. It can affect every swap device which supports asynchronous writeback and CRC checking as well as zRAM. Unfortunately, reuse_swap_page should be atomic so that we cannot wait on writeback in there so the approach in this patch is simply return false if we found it needs stable page. Although it increases memory footprint temporarily, it happens rarely and it should be reclaimed easily althoug it happened. Also, It would be better than waiting of IO completion, which is critial path for application latency. Cc: Hugh Dickins <hughd@xxxxxxxxxx> Cc: Darrick J. Wong <darrick.wong@xxxxxxxxxx> Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx> --- * from v1 * use swap_info_struct instead of swapper_space->host inode - Hugh include/linux/swap.h | 3 ++- mm/swapfile.c | 20 +++++++++++++++++++- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index a56523cefb9b..55ff5593c193 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -150,8 +150,9 @@ enum { SWP_FILE = (1 << 7), /* set after swap_activate success */ SWP_AREA_DISCARD = (1 << 8), /* single-time swap area discards */ SWP_PAGE_DISCARD = (1 << 9), /* freed swap page-cluster discards */ + SWP_STABLE_WRITES = (1 << 10), /* no overwrite PG_writeback pages */ /* add others here before... */ - SWP_SCANNING = (1 << 10), /* refcount in scan_swap_map */ + SWP_SCANNING = (1 << 11), /* refcount in scan_swap_map */ }; #define SWAP_CLUSTER_MAX 32UL diff --git a/mm/swapfile.c b/mm/swapfile.c index 2210de290b54..66bc330c0b65 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -943,11 +943,25 @@ bool reuse_swap_page(struct page *page, int *total_mapcount) count = page_trans_huge_mapcount(page, total_mapcount); if (count <= 1 && PageSwapCache(page)) { count += page_swapcount(page); - if (count == 1 && !PageWriteback(page)) { + if (count != 1) + goto out; + if (!PageWriteback(page)) { delete_from_swap_cache(page); SetPageDirty(page); + } else { + swp_entry_t entry; + struct swap_info_struct *p; + + entry.val = page_private(page); + p = swap_info_get(entry); + if (p->flags & SWP_STABLE_WRITES) { + spin_unlock(&p->lock); + return false; + } + spin_unlock(&p->lock); } } +out: return count <= 1; } @@ -2447,6 +2461,10 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) error = -ENOMEM; goto bad_swap; } + + if (bdi_cap_stable_pages_required(inode_to_bdi(inode))) + p->flags |= SWP_STABLE_WRITES; + if (p->bdev && blk_queue_nonrot(bdev_get_queue(p->bdev))) { int cpu; -- 2.7.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>