Hi, John, Firstly, thanks for taking a look at the whole set. On Wed, Dec 07, 2022 at 02:36:21PM -0800, John Hubbard wrote: > > @@ -5886,8 +5866,22 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > * fault, and is_hugetlb_entry_(migration|hwpoisoned) check will > > * properly handle it. > > */ > > - if (!pte_present(entry)) > > + if (!pte_present(entry)) { > > + if (unlikely(is_hugetlb_entry_migration(entry))) { > > + /* > > + * Release fault lock first because the vma lock is > > + * needed to guard the huge_pte_lockptr() later in > > + * migration_entry_wait_huge(). The vma lock will > > + * be released there. > > + */ > > + mutex_unlock(&hugetlb_fault_mutex_table[hash]); > > + migration_entry_wait_huge(vma, ptep); > > + return 0; > > Oh, but now (and also one other, pre-existing case, above) > hugetlb_fault() is returning with the vma lock held. Note that here migration_entry_wait_huge() will release it. Sorry it's definitely not as straightforward, but this is also something I didn't come up with a better solution, because we need the vma lock to protect the spinlock, which is used in deep code path of the migration code. That's also why I added a rich comment above, and there's "The vma lock will be released there" which is just for that. > This is in contrast > with most of the rest of the function, which takes great care to release > locks before returning. > > Which makes this new case really quite irregular and makes the overall > locking harder to follow. It would be ideal to avoid doing this! But at > the very least, there should be a little note above hugetlb_fault(), > explaining this deviation and how it fits in with the locking rules. > > Do we really have to structure it this way, though? -- Peter Xu