Hi Vinayak, On Fri, Aug 30, 2019 at 06:13:31PM +0530, Vinayak Menon wrote: > The following race is observed due to which a processes faulting > on a swap entry, finds the page neither in swapcache nor swap. This > causes zram to give a zero filled page that gets mapped to the > process, resulting in a user space crash later. > > Consider parent and child processes Pa and Pb sharing the same swap > slot with swap_count 2. Swap is on zram with SWP_SYNCHRONOUS_IO set. > Virtual address 'VA' of Pa and Pb points to the shared swap entry. > > Pa Pb > > fault on VA fault on VA > do_swap_page do_swap_page > lookup_swap_cache fails lookup_swap_cache fails > Pb scheduled out > swapin_readahead (deletes zram entry) > swap_free (makes swap_count 1) > Pb scheduled in > swap_readpage (swap_count == 1) > Takes SWP_SYNCHRONOUS_IO path > zram enrty absent > zram gives a zero filled page > > Fix this by reading the swap_count before lookup_swap_cache, which conforms > with the order in which page is added to swap cache and swap count is > decremented in do_swap_page. In the race case above, this will let Pb take > the readahead path and thus pick the proper page from swapcache. Thanks for the report, Vinayak. It's a zram specific issue because it deallocates zram block unconditionally once read IO is done. The expectation was that dirty page is on the swap cache but with SWP_SYNCHRONOUS_IO, it's not true any more so I want to resolve the issue in zram specific code, not general one. A idea in my mind is swap_slot_free_notify should check the slot reference counter and if it's higher than 1, it shouldn't free the slot until. What do you think about? > > Signed-off-by: Vinayak Menon <vinmenon@xxxxxxxxxxxxxx> > --- > mm/memory.c | 21 ++++++++++++++++----- > 1 file changed, 16 insertions(+), 5 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index e0c232f..22643aa 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2744,6 +2744,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > struct page *page = NULL, *swapcache; > struct mem_cgroup *memcg; > swp_entry_t entry; > + struct swap_info_struct *si; > + bool skip_swapcache = false; > pte_t pte; > int locked; > int exclusive = 0; > @@ -2771,15 +2773,24 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > > delayacct_set_flag(DELAYACCT_PF_SWAPIN); > + > + /* > + * lookup_swap_cache below can fail and before the SWP_SYNCHRONOUS_IO > + * check is made, another process can populate the swapcache, delete > + * the swap entry and decrement the swap count. So decide on taking > + * the SWP_SYNCHRONOUS_IO path before the lookup. In the event of the > + * race described, the victim process will find a swap_count > 1 > + * and can then take the readahead path instead of SWP_SYNCHRONOUS_IO. > + */ > + si = swp_swap_info(entry); > + if (si->flags & SWP_SYNCHRONOUS_IO && __swap_count(entry) == 1) > + skip_swapcache = true; > + > page = lookup_swap_cache(entry, vma, vmf->address); > swapcache = page; > > if (!page) { > - struct swap_info_struct *si = swp_swap_info(entry); > - > - if (si->flags & SWP_SYNCHRONOUS_IO && > - __swap_count(entry) == 1) { > - /* skip swapcache */ > + if (skip_swapcache) { > page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, > vmf->address); > if (page) { > -- > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a > member of the Code Aurora Forum, hosted by The Linux Foundation >