On Mon 09-09-19 09:35:39, Vinayak Menon wrote: > > On 9/3/2019 5:47 PM, Vinayak Menon wrote: > > On 9/3/2019 5:11 PM, Michal Hocko wrote: > >> On Tue 03-09-19 11:43:16, Vinayak Menon wrote: > >>> Hi Michal, > >>> > >>> Thanks for reviewing this. > >>> > >>> > >>> On 9/2/2019 6:51 PM, Michal Hocko wrote: > >>>> On Fri 30-08-19 18:13:31, Vinayak Menon wrote: > >>>>> The following race is observed due to which a processes faulting > >>>>> on a swap entry, finds the page neither in swapcache nor swap. This > >>>>> causes zram to give a zero filled page that gets mapped to the > >>>>> process, resulting in a user space crash later. > >>>>> > >>>>> Consider parent and child processes Pa and Pb sharing the same swap > >>>>> slot with swap_count 2. Swap is on zram with SWP_SYNCHRONOUS_IO set. > >>>>> Virtual address 'VA' of Pa and Pb points to the shared swap entry. > >>>>> > >>>>> Pa Pb > >>>>> > >>>>> fault on VA fault on VA > >>>>> do_swap_page do_swap_page > >>>>> lookup_swap_cache fails lookup_swap_cache fails > >>>>> Pb scheduled out > >>>>> swapin_readahead (deletes zram entry) > >>>>> swap_free (makes swap_count 1) > >>>>> Pb scheduled in > >>>>> swap_readpage (swap_count == 1) > >>>>> Takes SWP_SYNCHRONOUS_IO path > >>>>> zram enrty absent > >>>>> zram gives a zero filled page > >>>> This sounds like a zram issue, right? Why is a generic swap path changed > >>>> then? > >>> I think zram entry being deleted by Pa and zram giving out a zeroed page to Pb is normal. > >> Isn't that a data loss? The race you mentioned shouldn't be possible > >> with the standard swap storage AFAIU. If that is really the case then > >> the zram needs a fix rather than a generic path. Or at least a very good > >> explanation why the generic path is a preferred way. > > > > AFAIK, there isn't a data loss because, before deleting the entry, swap_slot_free_notify makes sure that > > > > page is in swapcache and marks the page dirty to ensure a swap out before reclaim. I am referring to the > > > > comment about this in swap_slot_free_notify. In the case of this race too, the page brought to swapcache > > > > by Pa is still in swapcache. It is just that Pb failed to find it due to the race. > > > > Yes, this race will not happen for standard swap storage and only for those block devices that set > > > > disk->fops->swap_slot_free_notify and have SWP_SYNCHRONOUS_IO set (which seems to be only zram). > > > > Now considering that zram works as expected, the fix is in generic path because the race is due to the bug in > > > > SWP_SYNCHRONOUS_IO handling in do_swap_page. And it is only the SWP_SYNCHRONOUS_IO handling in > > > > generic path which is modified. > > > > Hi Michal, > > Do you see any concerns with the patch or explanation of the problem ? I am sorry, I didn't have time to give this a more serious thought. You need somebody more familiar with the code and time to look into it. -- Michal Hocko SUSE Labs