Hi David, Thanks for your quick response. On Wed, Apr 24, 2024 at 05:26:39PM +0200, David Hildenbrand wrote: > > I gave it some more thought, and I think we are still missing something (I > wish PFNMAP/MIXEDMAP wouldn't be that hard). > > > + > > +/* > > + * +--------------+ pgoff == 0 > > + * | meta page | > > + * +--------------+ pgoff == 1 > > + * | subbuffer 0 | > > + * | | > > + * +--------------+ pgoff == (1 + (1 << subbuf_order)) > > + * | subbuffer 1 | > > + * | | > > + * ... > > + */ > > +#ifdef CONFIG_MMU > > +static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer, > > + struct vm_area_struct *vma) > > +{ > > + unsigned long nr_subbufs, nr_pages, vma_pages, pgoff = vma->vm_pgoff; > > + unsigned int subbuf_pages, subbuf_order; > > + struct page **pages; > > + int p = 0, s = 0; > > + int err; > > + > > I'd add some comments here like > > /* Refuse any MAP_PRIVATE or writable mappings. */ > > + if (vma->vm_flags & VM_WRITE || vma->vm_flags & VM_EXEC || > > + !(vma->vm_flags & VM_MAYSHARE)) > > + return -EPERM; > > + > > /* > * Make sure the mapping cannot become writable later. Also, tell the VM > * to not touch these pages pages (VM_DONTCOPY | VM_DONTDUMP) and tell > * GUP to leave them alone as well (VM_IO). > */ > > + vm_flags_mod(vma, > > + VM_MIXEDMAP | VM_PFNMAP | > > + VM_DONTCOPY | VM_DONTDUMP | VM_DONTEXPAND | VM_IO, > > + VM_MAYWRITE); > > I am still really unsure about VM_PFNMAP ... it's not a PFNMAP at all and, > as stated, vm_insert_pages() even complains quite a lot when it would have > to set VM_MIXEDMAP and VM_PFNMAP is already set, likely for a very good > reason. > > Can't we limit ourselves to VM_IO? > > But then, I wonder if it really helps much regarding GUP: yes, it blocks > ordinary GUP (see check_vma_flags()) but as insert_page_into_pte_locked() > does *not* set pte_special(), GUP-fast (gup_fast_pte_range()) will not > reject it. > > Really, if you want GUP-fast to reject it, remap_pfn_range() and friends are > the way to go, that will set pte_special() such that also GUP-fast will > leave it alone, just like vm_normal_page() would. > > So ... I know Linus recommended VM_PFNMAP/VM_IO to stop GUP, but it alone > won't stop all of GUP. We really have to mark the PTE as special, which > vm_insert_page() must not do (because it is refcounted!). Hum, apologies, I am not sure to follow the connection here. Why do you think the recommendation was to prevent GUP? > > Which means: do we really have to stop GUP from grabbing that page? > > Using vm_insert_page() only with VM_MIXEDMAP (and without VM_PFNMAP|VM_IO) > would be better. Under the assumption we do not want to stop all GUP, why not using VM_IO over VM_MIXEDMAP which is I believe more restrictive? > > If we want to stop all of GUP, remap_pfn_range() currently seems unavoidable > :( > > > > + > > + lockdep_assert_held(&cpu_buffer->mapping_lock); > > + > > + subbuf_order = cpu_buffer->buffer->subbuf_order; > > + subbuf_pages = 1 << subbuf_order; > > + > > + nr_subbufs = cpu_buffer->nr_pages + 1; /* + reader-subbuf */ > > + nr_pages = ((nr_subbufs) << subbuf_order) - pgoff + 1; /* + meta-page */ > > + > > + vma_pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; > > + if (!vma_pages || vma_pages > nr_pages) > > + return -EINVAL; > > + > > + nr_pages = vma_pages; > > + > > + pages = kcalloc(nr_pages, sizeof(*pages), GFP_KERNEL); > > + if (!pages) > > + return -ENOMEM; > > + > > + if (!pgoff) { > > + pages[p++] = virt_to_page(cpu_buffer->meta_page); > > + > > + /* > > + * TODO: Align sub-buffers on their size, once > > + * vm_insert_pages() supports the zero-page. > > + */ > > + } else { > > + /* Skip the meta-page */ > > + pgoff--; > > + > > + if (pgoff % subbuf_pages) { > > + err = -EINVAL; > > + goto out; > > + } > > + > > + s += pgoff / subbuf_pages; > > + } > > + > > + while (s < nr_subbufs && p < nr_pages) { > > + struct page *page = virt_to_page(cpu_buffer->subbuf_ids[s]); > > + int off = 0; > > + > > + for (; off < (1 << (subbuf_order)); off++, page++) { > > + if (p >= nr_pages) > > + break; > > + > > + pages[p++] = page; > > + } > > + s++; > > + } > > + > > + err = vm_insert_pages(vma, vma->vm_start, pages, &nr_pages); > > vm_insert_pages() documents: "In case of error, we may have mapped a subset > of the provided pages. It is the caller's responsibility to account for this > case." > > Which could for example happen, when allocating a page table fails. > > Would we able to deal with that here? As we are in the mmap path, on an error, I would expect the vma to be destroyed and those pages whom insertion succeeded to be unmapped? But perhaps shall we proactively zap_page_range_single()? > > > Again, I wish it would all be easier ... > > -- > Cheers, > > David / dhildenb > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@xxxxxxxxxxx. >