Re: FAILED: patch "[PATCH] mm: fix unexpected zeroed page mapping with zram swap" failed to apply to 5.4-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 05, 2022 at 09:27:49AM -0700, Minchan Kim wrote:
> On Mon, Apr 18, 2022 at 10:05:15AM +0200, gregkh@xxxxxxxxxxxxxxxxxxx wrote:
> > 
> > The patch below does not apply to the 5.4-stable tree.
> > If someone wants it applied there, or to any other stable or longterm
> > tree, then please email the backport, including the original git commit
> > id to <stable@xxxxxxxxxxxxxxx>.
> > 
> > thanks,
> > 
> > greg k-h
> 
> < snip >
> 
> This is backport for 5.4-stable.
> 
> >From 587a2b370d803766fd57778cdd1f05511658a246 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@xxxxxxxxxx>
> Date: Thu, 14 Apr 2022 19:13:46 -0700
> Subject: [PATCH] mm: fix unexpected zeroed page mapping with zram swap
> 
> e914d8f00391520ecc4495dd0ca0124538ab7119 upstream.
> 
> Two processes under CLONE_VM cloning, user process can be corrupted by
> seeing zeroed page unexpectedly.
> 
>       CPU A                        CPU B
> 
>   do_swap_page                do_swap_page
>   SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
>   swap_readpage valid data
>     swap_slot_free_notify
>       delete zram entry
>                               swap_readpage zeroed(invalid) data
>                               pte_lock
>                               map the *zero data* to userspace
>                               pte_unlock
>   pte_lock
>   if (!pte_same)
>     goto out_nomap;
>   pte_unlock
>   return and next refault will
>   read zeroed data
> 
> The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't
> increase the refcount of swap slot at copy_mm so it couldn't catch up
> whether it's safe or not to discard data from backing device.  In the
> case, only the lock it could rely on to synchronize swap slot freeing is
> page table lock.  Thus, this patch gets rid of the swap_slot_free_notify
> function.  With this patch, CPU A will see correct data.
> 
>       CPU A                        CPU B
> 
>   do_swap_page                do_swap_page
>   SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
>                               swap_readpage original data
>                               pte_lock
>                               map the original data
>                               swap_free
>                                 swap_range_free
>                                   bd_disk->fops->swap_slot_free_notify
>   swap_readpage read zeroed data
>                               pte_unlock
>   pte_lock
>   if (!pte_same)
>     goto out_nomap;
>   pte_unlock
>   return
>   on next refault will see mapped data by CPU B
> 
> The concern of the patch would increase memory consumption since it
> could keep wasted memory with compressed form in zram as well as
> uncompressed form in address space.  However, most of cases of zram uses
> no readahead and do_swap_page is followed by swap_free so it will free
> the compressed form from in zram quickly.
> 
> Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@xxxxxxxxxx
> Fixes: 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous device")
> Reported-by: Ivan Babrou <ivan@xxxxxxxxxxxxxx>
> Tested-by: Ivan Babrou <ivan@xxxxxxxxxxxxxx>
> Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
> Cc: Nitin Gupta <ngupta@xxxxxxxxxx>
> Cc: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx>
> Cc: Jens Axboe <axboe@xxxxxxxxx>
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>	[4.14+]
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> ---
>  mm/page_io.c | 54 ----------------------------------------------------
>  1 file changed, 54 deletions(-)
> 
> diff --git a/mm/page_io.c b/mm/page_io.c
> index bcf27d057253..f0e3f2be7b44 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -69,54 +69,6 @@ void end_swap_bio_write(struct bio *bio)
>  	bio_put(bio);
>  }
>  
> -static void swap_slot_free_notify(struct page *page)
> -{
> -	struct swap_info_struct *sis;
> -	struct gendisk *disk;
> -	swp_entry_t entry;
> -
> -	/*
> -	 * There is no guarantee that the page is in swap cache - the software
> -	 * suspend code (at least) uses end_swap_bio_read() against a non-
> -	 * swapcache page.  So we must check PG_swapcache before proceeding with
> -	 * this optimization.
> -	 */
> -	if (unlikely(!PageSwapCache(page)))
> -		return;
> -
> -	sis = page_swap_info(page);
> -	if (!(sis->flags & SWP_BLKDEV))
> -		return;
> -
> -	/*
> -	 * The swap subsystem performs lazy swap slot freeing,
> -	 * expecting that the page will be swapped out again.
> -	 * So we can avoid an unnecessary write if the page
> -	 * isn't redirtied.
> -	 * This is good for real swap storage because we can
> -	 * reduce unnecessary I/O and enhance wear-leveling
> -	 * if an SSD is used as the as swap device.
> -	 * But if in-memory swap device (eg zram) is used,
> -	 * this causes a duplicated copy between uncompressed
> -	 * data in VM-owned memory and compressed data in
> -	 * zram-owned memory.  So let's free zram-owned memory
> -	 * and make the VM-owned decompressed page *dirty*,
> -	 * so the page should be swapped out somewhere again if
> -	 * we again wish to reclaim it.
> -	 */
> -	disk = sis->bdev->bd_disk;
> -	entry.val = page_private(page);
> -	if (disk->fops->swap_slot_free_notify && __swap_count(entry) == 1) {
> -		unsigned long offset;
> -
> -		offset = swp_offset(entry);
> -
> -		SetPageDirty(page);
> -		disk->fops->swap_slot_free_notify(sis->bdev,
> -				offset);
> -	}
> -}
> -
>  static void end_swap_bio_read(struct bio *bio)
>  {
>  	struct page *page = bio_first_page_all(bio);
> @@ -132,7 +84,6 @@ static void end_swap_bio_read(struct bio *bio)
>  	}
>  
>  	SetPageUptodate(page);
> -	swap_slot_free_notify(page);
>  out:
>  	unlock_page(page);
>  	WRITE_ONCE(bio->bi_private, NULL);
> @@ -371,11 +322,6 @@ int swap_readpage(struct page *page, bool synchronous)
>  
>  	ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
>  	if (!ret) {
> -		if (trylock_page(page)) {
> -			swap_slot_free_notify(page);
> -			unlock_page(page);
> -		}
> -
>  		count_vm_event(PSWPIN);
>  		return 0;
>  	}
> -- 
> 2.36.0.512.ge40c2bad7a-goog
> 

Both now queued up, thanks.

greg k-h



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux