Re: [PATCH] mm: fix the race between swapin_readahead and SWP_SYNCHRONOUS_IO path

Michal Hocko <mhocko@xxxxxxxxxx> · Mon, 9 Sep 2019 13:23:52 +0200



On Mon 09-09-19 09:35:39, Vinayak Menon wrote:
> 
> On 9/3/2019 5:47 PM, Vinayak Menon wrote:
> > On 9/3/2019 5:11 PM, Michal Hocko wrote:
> >> On Tue 03-09-19 11:43:16, Vinayak Menon wrote:
> >>> Hi Michal,
> >>>
> >>> Thanks for reviewing this.
> >>>
> >>>
> >>> On 9/2/2019 6:51 PM, Michal Hocko wrote:
> >>>> On Fri 30-08-19 18:13:31, Vinayak Menon wrote:
> >>>>> The following race is observed due to which a processes faulting
> >>>>> on a swap entry, finds the page neither in swapcache nor swap. This
> >>>>> causes zram to give a zero filled page that gets mapped to the
> >>>>> process, resulting in a user space crash later.
> >>>>>
> >>>>> Consider parent and child processes Pa and Pb sharing the same swap
> >>>>> slot with swap_count 2. Swap is on zram with SWP_SYNCHRONOUS_IO set.
> >>>>> Virtual address 'VA' of Pa and Pb points to the shared swap entry.
> >>>>>
> >>>>> Pa                                       Pb
> >>>>>
> >>>>> fault on VA                              fault on VA
> >>>>> do_swap_page                             do_swap_page
> >>>>> lookup_swap_cache fails                  lookup_swap_cache fails
> >>>>>                                          Pb scheduled out
> >>>>> swapin_readahead (deletes zram entry)
> >>>>> swap_free (makes swap_count 1)
> >>>>>                                          Pb scheduled in
> >>>>>                                          swap_readpage (swap_count == 1)
> >>>>>                                          Takes SWP_SYNCHRONOUS_IO path
> >>>>>                                          zram enrty absent
> >>>>>                                          zram gives a zero filled page
> >>>> This sounds like a zram issue, right? Why is a generic swap path changed
> >>>> then?
> >>> I think zram entry being deleted by Pa and zram giving out a zeroed page to Pb is normal.
> >> Isn't that a data loss? The race you mentioned shouldn't be possible
> >> with the standard swap storage AFAIU. If that is really the case then
> >> the zram needs a fix rather than a generic path. Or at least a very good
> >> explanation why the generic path is a preferred way.
> >
> > AFAIK, there isn't a data loss because, before deleting the entry, swap_slot_free_notify makes sure that
> >
> > page is in swapcache and marks the page dirty to ensure a swap out before reclaim. I am referring to the
> >
> > comment about this in swap_slot_free_notify. In the case of this race too, the page brought to swapcache
> >
> > by Pa is still in swapcache. It is just that Pb failed to find it due to the race.
> >
> > Yes, this race will not happen for standard swap storage and only for those block devices that set
> >
> > disk->fops->swap_slot_free_notify and have SWP_SYNCHRONOUS_IO set (which seems to be only zram).
> >
> > Now considering that zram works as expected, the fix is in generic path because the race is due to the bug in
> >
> > SWP_SYNCHRONOUS_IO handling in do_swap_page. And it is only the SWP_SYNCHRONOUS_IO handling in
> >
> > generic path which is modified.
> >
> 
> Hi Michal,
> 
> Do you see any concerns with the patch or explanation of the problem ?

I am sorry, I didn't have time to give this a more serious thought. You
need somebody more familiar with the code and time to look into it.
-- 
Michal Hocko
SUSE Labs