On Wed, Jul 12, 2023 at 05:55:47PM +0200, David Hildenbrand wrote: > On 12.07.23 17:52, Matthew Wilcox wrote: > > On Wed, Jul 12, 2023 at 08:01:18AM -0700, Suren Baghdasaryan wrote: > > > Are you suggesting to break remap_pfn_range() into two stages > > > (remap_pfn_range_prepare() then remap_pfn_range())? > > > If so, there are many places remap_pfn_range() is called and IIUC all > > > of them would need to use that 2-stage approach (lots of code churn). > > > In addition, this is an exported function, so many more drivers might > > > expect the current behavior. > > > > You do not understand correctly. > > > > When somebody calls mmap, there are two reasonable implementations. > > Here's one: > > > > .mmap = snd_dma_iram_mmap, > > > > static int snd_dma_iram_mmap(struct snd_dma_buffer *dmab, > > struct vm_area_struct *area) > > { > > area->vm_page_prot = pgprot_writecombine(area->vm_page_prot); > > return remap_pfn_range(area, area->vm_start, > > dmab->addr >> PAGE_SHIFT, > > area->vm_end - area->vm_start, > > area->vm_page_prot); > > } > > > > This is _fine_. It is not called from the fault path, it is called in > > process context. Few locks are held (which ones aren't even > > documented!) > > > > The other way is to set vma->vm_ops. The fault handler in vm_ops > > should not be calling remap_pfn_range(). It should be calling > > set_ptes(). I almost have this driver fixed up, but I have another > > meeting to go to now. > > Just a note that we still have to make sure that the VMA flags will be set > properly -- I guess at mmap time is the right time as I suggested above. It actually does that already: static int gru_file_mmap(struct file *file, struct vm_area_struct *vma) { if ((vma->vm_flags & (VM_SHARED | VM_WRITE)) != (VM_SHARED | VM_WRITE)) return -EPERM; if (vma->vm_start & (GRU_GSEG_PAGESIZE - 1) || vma->vm_end & (GRU_GSEG_PAGESIZE - 1)) return -EINVAL; vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_LOCKED | VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP); This compiles, but obviously I don't have a spare HP supercomputer lying around for me to test whether it works. Also set_ptes() was only just introduced to the mm tree, so doing something that needs backporting would take more effort (maybe having a private set_ptes() in the driver would be a good backport option that calls set_pte_at() in a loop). diff --git a/drivers/misc/sgi-gru/grumain.c b/drivers/misc/sgi-gru/grumain.c index 4eb4b9455139..c21bcb528f12 100644 --- a/drivers/misc/sgi-gru/grumain.c +++ b/drivers/misc/sgi-gru/grumain.c @@ -951,6 +951,8 @@ vm_fault_t gru_fault(struct vm_fault *vmf) } if (!gts->ts_gru) { + pte_t *ptep, pte; + STAT(load_user_context); if (!gru_assign_gru_context(gts)) { preempt_enable(); @@ -964,9 +966,12 @@ vm_fault_t gru_fault(struct vm_fault *vmf) } gru_load_context(gts); paddr = gseg_physical_address(gts->ts_gru, gts->ts_ctxnum); - remap_pfn_range(vma, vaddr & ~(GRU_GSEG_PAGESIZE - 1), - paddr >> PAGE_SHIFT, GRU_GSEG_PAGESIZE, - vma->vm_page_prot); + + pte = pfn_pte(paddr / PAGE_SIZE, vma->vm_page_prot); + ptep = vmf->pte - (vaddr % GRU_GSEG_PAGESIZE) / PAGE_SIZE; + set_ptes(vma->vm_mm, vaddr & ~(GRU_GSEG_PAGESIZE - 1), + ptep, pte_mkspecial(pte), + GRU_GSEG_PAGESIZE / PAGE_SIZE); } preempt_enable();