On 9/3/2019 5:11 PM, Michal Hocko wrote: > On Tue 03-09-19 11:43:16, Vinayak Menon wrote: >> Hi Michal, >> >> Thanks for reviewing this. >> >> >> On 9/2/2019 6:51 PM, Michal Hocko wrote: >>> On Fri 30-08-19 18:13:31, Vinayak Menon wrote: >>>> The following race is observed due to which a processes faulting >>>> on a swap entry, finds the page neither in swapcache nor swap. This >>>> causes zram to give a zero filled page that gets mapped to the >>>> process, resulting in a user space crash later. >>>> >>>> Consider parent and child processes Pa and Pb sharing the same swap >>>> slot with swap_count 2. Swap is on zram with SWP_SYNCHRONOUS_IO set. >>>> Virtual address 'VA' of Pa and Pb points to the shared swap entry. >>>> >>>> Pa Pb >>>> >>>> fault on VA fault on VA >>>> do_swap_page do_swap_page >>>> lookup_swap_cache fails lookup_swap_cache fails >>>> Pb scheduled out >>>> swapin_readahead (deletes zram entry) >>>> swap_free (makes swap_count 1) >>>> Pb scheduled in >>>> swap_readpage (swap_count == 1) >>>> Takes SWP_SYNCHRONOUS_IO path >>>> zram enrty absent >>>> zram gives a zero filled page >>> This sounds like a zram issue, right? Why is a generic swap path changed >>> then? >> >> I think zram entry being deleted by Pa and zram giving out a zeroed page to Pb is normal. > Isn't that a data loss? The race you mentioned shouldn't be possible > with the standard swap storage AFAIU. If that is really the case then > the zram needs a fix rather than a generic path. Or at least a very good > explanation why the generic path is a preferred way. AFAIK, there isn't a data loss because, before deleting the entry, swap_slot_free_notify makes sure that page is in swapcache and marks the page dirty to ensure a swap out before reclaim. I am referring to the comment about this in swap_slot_free_notify. In the case of this race too, the page brought to swapcache by Pa is still in swapcache. It is just that Pb failed to find it due to the race. Yes, this race will not happen for standard swap storage and only for those block devices that set disk->fops->swap_slot_free_notify and have SWP_SYNCHRONOUS_IO set (which seems to be only zram). Now considering that zram works as expected, the fix is in generic path because the race is due to the bug in SWP_SYNCHRONOUS_IO handling in do_swap_page. And it is only the SWP_SYNCHRONOUS_IO handling in generic path which is modified.