"Kasireddy, Vivek" <vivek.kasireddy@xxxxxxxxx> writes: > Hi Alistair, Hi Vivek, >> I wonder if we actually need the flag? IIUC it is already used for more >> than just KSM. For example it can be called as part of fault handling by >> set_pte_at_notify() in in wp_page_copy(). > Yes, I noticed that but what I really meant is I'd put all these prior instances > of change_pte in one category using the flag. Without the flag, KVM, the only > user that currently has a callback for change_pte would get notified which > may not be appropriate. Note that the change_pte callback for KVM was > added (based on Git log) for KSM updates and it is not clear to me if that > is still the case. It is certainly now called from contexts other than KSM though. I have no idea whether that's a problem, nor if adding more callers would actually be an issue though so understand the motivation for the flag. >> > Would that work for your HMM use-case -- assuming we call change_pte >> after >> > taking PTL? >> >> I suspect being called under the PTL could be an issue. For HMM we use a >> combination of sequence numbers and a mutex to synchronise PTEs. To >> avoid calling the notifier while holding PTL we might be able to record >> the sequence number (subscriptions->invalidate_seq) while holding PTL, >> release the PTL and provide that sequence number to the notifier >> callback along with the PTE. >> >> Some form of mmu_interval_read_retry() could then be used to detect if >> the PTE has changed between dropping the PTL and calling the >> update_pte()/change_pte() notifier. >> >> Of course if your design can handle being called while holding the PTL >> then the above is probably unnecessarily complex for your use-case. > Yes, I believe we can handle it while holding the PTL. > >> >> My primary issue with this patch is the notifier is called without the >> PTL while providing a PTE value. Without some form of synchronisation it >> isn't safe to use the result of eg. pte_page(pte) or pte_write(pte) in >> the notifier callback. Based on your comments it seems udmabuf might >> have some other synchronisation that makes it safe, but being external >> to the notifier calls make it's hard to reason about. > I intend to fix the PTL issue in v2 but I am still not sure what is the best > thing to do as far as the notifier is concerned given the following options: > - Keep this patch (and notifier name) but ensure that it is called under PTL I think that's preferable to adding a flag so long as it's implemented and documented that this is called whenever a PTE is updated. Otherwise a third user will come along and have the same problem we've currently got with KVMs usage. > - Drop this patch and expand the use of change_pte but add the flag to > distinguish between prior usage and new usage > - Keep this patch but don't include the PTE or the pfn of the new page as > part of the notifier. In other words, just have this: > mmu_notifier_update_mapping(struct mm_struct *mm, unsigned long address) > This way, in udmabuf driver, we could get the new page from the page cache > as soon as we get notified: > mapoff = linear_page_index(vma, address); > new_page = find_get_page(vma->vm_file->f_mapping, mapoff); > This last option would probably limit the new notifier to the udmabuf > use-case but I do not intend to pursue it as you suggested that you are > also interested in a new notifier associated with PTE updates. Actually the last option isn't limiting assuming it's sent whenever the PTE is updated. It just means users have to use hmm_range_fault() or some equivalent that already enforces proper synchronisation if they need the actual PTE value. That seems fine to me. > Thanks, > Vivek > >> >> - Alistair >> >> > Thanks, >> > Vivek >> > >> >> >> >> Thanks. >> >> >> >> > Thanks, >> >> > Vivek >> >> > >> >> >> >> >> >> > + return ret; >> >> >> > >> >> >> > ret = 0; >> >> >> > >> >> >> > @@ -6223,6 +6227,9 @@ vm_fault_t hugetlb_fault(struct mm_struct >> >> *mm, >> >> >> struct vm_area_struct *vma, >> >> >> > */ >> >> >> > if (need_wait_lock) >> >> >> > folio_wait_locked(folio); >> >> >> > + if (!ret) >> >> >> > + mmu_notifier_update_mapping(vma->vm_mm, >> address, >> >> >> > + pte_pfn(*ptep)); >> >> >> > return ret; >> >> >> > } >> >> >> > >> >> >> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c >> >> >> > index 50c0dde1354f..6421405334b9 100644 >> >> >> > --- a/mm/mmu_notifier.c >> >> >> > +++ b/mm/mmu_notifier.c >> >> >> > @@ -441,6 +441,23 @@ void __mmu_notifier_change_pte(struct >> >> >> mm_struct *mm, unsigned long address, >> >> >> > srcu_read_unlock(&srcu, id); >> >> >> > } >> >> >> > >> >> >> > +void __mmu_notifier_update_mapping(struct mm_struct *mm, >> >> unsigned >> >> >> long address, >> >> >> > + unsigned long pfn) >> >> >> > +{ >> >> >> > + struct mmu_notifier *subscription; >> >> >> > + int id; >> >> >> > + >> >> >> > + id = srcu_read_lock(&srcu); >> >> >> > + hlist_for_each_entry_rcu(subscription, >> >> >> > + &mm->notifier_subscriptions->list, >> hlist, >> >> >> > + srcu_read_lock_held(&srcu)) { >> >> >> > + if (subscription->ops->update_mapping) >> >> >> > + subscription->ops- >> >update_mapping(subscription, >> >> >> mm, >> >> >> > + address, >> pfn); >> >> >> > + } >> >> >> > + srcu_read_unlock(&srcu, id); >> >> >> > +} >> >> >> > + >> >> >> > static int mn_itree_invalidate(struct mmu_notifier_subscriptions >> >> >> *subscriptions, >> >> >> > const struct mmu_notifier_range *range) >> >> >> > { >> >> >> > diff --git a/mm/shmem.c b/mm/shmem.c >> >> >> > index 2f2e0e618072..e59eb5fafadb 100644 >> >> >> > --- a/mm/shmem.c >> >> >> > +++ b/mm/shmem.c >> >> >> > @@ -77,6 +77,7 @@ static struct vfsmount *shm_mnt; >> >> >> > #include <linux/fcntl.h> >> >> >> > #include <uapi/linux/memfd.h> >> >> >> > #include <linux/rmap.h> >> >> >> > +#include <linux/mmu_notifier.h> >> >> >> > #include <linux/uuid.h> >> >> >> > >> >> >> > #include <linux/uaccess.h> >> >> >> > @@ -2164,8 +2165,12 @@ static vm_fault_t shmem_fault(struct >> >> vm_fault >> >> >> *vmf) >> >> >> > gfp, vma, vmf, &ret); >> >> >> > if (err) >> >> >> > return vmf_error(err); >> >> >> > - if (folio) >> >> >> > + if (folio) { >> >> >> > vmf->page = folio_file_page(folio, vmf->pgoff); >> >> >> > + if (ret == VM_FAULT_LOCKED) >> >> >> > + mmu_notifier_update_mapping(vma- >> >vm_mm, vmf- >> >> >> >address, >> >> >> > + page_to_pfn(vmf- >> >page)); >> >> >> > + } >> >> >> > return ret; >> >> >> > }