On 11/7/19 6:04 PM, Davidlohr Bueso wrote: > On Thu, 07 Nov 2019, Mike Kravetz wrote: > >> Note that huge_pmd_share now increments the page count with the semaphore >> held just in read mode. It is OK to do increments in parallel without >> synchronization. However, we don't want anyone else changing the count >> while that check in huge_pmd_unshare is happening. Hence, the need for >> taking the semaphore in write mode. > > This would be a nice addition to the changelog methinks. Last night I remembered there is one place where we currently take i_mmap_rwsem in read mode and potentially call huge_pmd_unshare. That is in try_to_unmap_one. Yes, there is a potential race here today. But that race is somewhat contained as you need two threads doing some combination of page migration and page poisoning to race. This change now allows migration or poisoning to race with page fault. I would really prefer if we do not open up the race window in this manner. Getting this right in the try_to_unmap_one case is a bit tricky. I had code to do this in the past that was part of a bigger hugetlb synchronization change. All those changes got reverted (commit ddeaab32a89f), but I believe it is possible to change try_to_unmap_one calling sequences without introducing other issues. Bottom line is that more changes are needed in this patch. I'll work on those changes unless someone else volunteers. It will likely take me one or two days to come up with and test proposed changes. -- Mike Kravetz