On 10/30/22 17:29, Peter Xu wrote: > Resolution > ========== > > What this patch proposed is, besides using the vma lock, we can also use > RCU to protect the pgtable page from being freed from under us when > huge_pte_offset() is used. The idea is kind of similar to RCU fast-gup. > Note that fast-gup is very safe regarding pmd unsharing even before vma > lock, because fast-gup relies on RCU to protect walking any pgtable page, > including another mm's. > > To apply the same idea to huge_pte_offset(), it means with proper RCU > protection the pte_t* pointer returned from huge_pte_offset() can also be > always safe to access and de-reference, along with the pgtable lock that > was bound to the pgtable page. > > Patch Layout > ============ > > Patch 1 is a trivial cleanup that I noticed when working on this. Please > shoot if anyone think I should just post it separately, or hopefully I can > still just carry it over. > > Patch 2 is the gut of the patchset, describing how we should use the helper > huge_pte_offset() correctly. Only a comment patch but should be the most > important one, as the follow up patches are just trying to follow the rule > it setup here. > > The rest patches resolve all the call sites of huge_pte_offset() to make > sure either it's with the vma lock (which is perfectly good enough for > safety in this case; the last patch commented on all those callers to make > sure we won't miss a single case, and why they're safe). Besides, each of > the patch will add rcu protection to one caller of huge_pte_offset(). > > Tests > ===== > > Only lightly tested on hugetlb kselftests including uffd, no more errors > triggered than current mm-unstable (hugetlb-madvise fails before/after > here, with error "Unexpected number of free huge pages line 207"; haven't > really got time to look into it). Do not worry about the madvise test failure, that is caused by a recent change. Unless I am missing something, the basic strategy in this series is to wrap calls to huge_pte_offset and subsequent ptep access with rcu_read_lock/unlock calls. I must embarrassingly admit that it has been a loooong time since I had to look at rcu usage and may not know what I am talking about. However, I seem to recall that one needs to somehow flag the data items being protected from update/freeing. I do not see anything like that in the huge_pmd_unshare routine where pmd page pointer is updated. Or, is it where the pmd page pointer is referenced in huge_pte_offset? Please ignore if you are certain of this rcu usage, otherwise I will spend some time reeducating myself. -- Mike Kravetz