On Fri, Dec 15, 2017 at 09:33:03AM +0800, Huang, Ying wrote: > Michal Hocko <mhocko@xxxxxxxxxx> writes: > > > On Thu 14-12-17 21:38:32, Huang, Ying wrote: > >> From: Huang Ying <ying.huang@xxxxxxxxx> > >> > >> When the swapin is performed, after getting the swap entry information > >> from the page table, system will swap in the swap entry, without any > >> lock held to prevent the swap device from being swapoff. This may > >> cause the race like below, > >> > >> CPU 1 CPU 2 > >> ----- ----- > >> do_swap_page > >> swapin_readahead > >> __read_swap_cache_async > >> swapoff swapcache_prepare > >> p->swap_map = NULL __swap_duplicate > >> p->swap_map[?] /* !!! NULL pointer access */ > >> > >> Because swap off is usually done when system shutdown only, the race > >> may not hit many people in practice. But it is still a race need to > >> be fixed. > >> > >> To fix the race, get_swap_device() is added to prevent swap device > >> from being swapoff until put_swap_device() is called. When > >> get_swap_device() is called, the caller should have some locks (like > >> PTL, page lock, or swap_info_struct->lock) held to guarantee the swap > >> entry is valid, or check the origin of swap entry again to make sure > >> the swap device hasn't been swapoff already. > >> > >> Because swapoff() is very race code path, to make the normal path runs > > > > s@race@rare@ I suppose > > Oops, thanks for pointing this out! > > >> as fast as possible, SRCU instead of reference count is used to > >> implement get/put_swap_device(). From get_swap_device() to > >> put_swap_device(), the reader side of SRCU is locked, so > >> synchronize_srcu() in swapoff() will wait until put_swap_device() is > >> called. > > > > It is quite unfortunate to pull SRCU as a dependency to the core kernel. > > Different attempts to do this have failed in the past. This one is > > slightly different though because I would suspect that those tiny > > systems do not configure swap. But who knows, maybe they do. > > I remember Paul said there is a tiny implementation of SRCU which can > fit this requirement. > > Hi, Paul, whether my memory is correct? Yes, if you build with CONFIG_SMP=n, then you will get Tiny SRCU, which is quite compact. Thanx, Paul > > Anyway, if you are worried about performance then I would expect some > > numbers to back that worry. So why don't simply start with simpler > > ref count based and then optimize it later based on some actual numbers. > > My -V1 is based on ref count. I think the performance difference should > be not measurable. The idea is that swapoff() is so rare, so we should > accelerate normal path as much as possible, even if this will cause slow > down in swapoff. If we cannot use SRCU in the end, we may try RCU, > preempt off (for stop_machine()), etc. > > > Btw. have you considered pcp refcount framework. I would suspect that > > this would give you close to SRCU performance. > > No. I think pcp refcount doesn't fit here. You should hold a initial > refcount for pcp refcount, it isn't the case here. > > Best Regards, > Huang, Ying > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>