On Wed 11-10-23 19:14:10, Lorenzo Stoakes wrote: > On Wed, Oct 11, 2023 at 11:46:27AM +0200, Jan Kara wrote: > > On Sat 07-10-23 21:51:01, Lorenzo Stoakes wrote: > > > In order for an F_SEAL_WRITE sealed memfd mapping to have an opportunity to > > > clear VM_MAYWRITE in seal_check_write() we must be able to invoke either > > > the shmem_mmap() or hugetlbfs_file_mmap() f_ops->mmap() handler to do so. > > > > > > We would otherwise fail the mapping_map_writable() check before we had > > > the opportunity to clear VM_MAYWRITE. > > > > > > However, the existing logic in mmap_region() performs this check BEFORE > > > calling call_mmap() (which invokes file->f_ops->mmap()). We must enforce > > > this check AFTER the function call. > > > > > > In order to avoid any risk of breaking call_mmap() handlers which assume > > > this will have been done first, we continue to mark the file writable > > > first, simply deferring enforcement of it failing until afterwards. > > > > > > This enables mmap(..., PROT_READ, MAP_SHARED, fd, 0) mappings for memfd's > > > sealed via F_SEAL_WRITE to succeed, whereas previously they were not > > > permitted. > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217238 > > > Signed-off-by: Lorenzo Stoakes <lstoakes@xxxxxxxxx> > > > > ... > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index 6f6856b3267a..9fbee92aaaee 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -2767,17 +2767,25 @@ unsigned long mmap_region(struct file *file, unsigned long addr, > > > vma->vm_pgoff = pgoff; > > > > > > if (file) { > > > - if (is_shared_maywrite(vm_flags)) { > > > - error = mapping_map_writable(file->f_mapping); > > > - if (error) > > > - goto free_vma; > > > - } > > > + int writable_error = 0; > > > + > > > + if (vma_is_shared_maywrite(vma)) > > > + writable_error = mapping_map_writable(file->f_mapping); > > > > > > vma->vm_file = get_file(file); > > > error = call_mmap(file, vma); > > > if (error) > > > goto unmap_and_free_vma; > > > > > > + /* > > > + * call_mmap() may have changed VMA flags, so retry this check > > > + * if it failed before. > > > + */ > > > + if (writable_error && vma_is_shared_maywrite(vma)) { > > > + error = writable_error; > > > + goto close_and_free_vma; > > > + } > > > > Hum, this doesn't quite give me a peace of mind ;). One bug I can see is > > that if call_mmap() drops the VM_MAYWRITE flag, we seem to forget to drop > > i_mmap_writeable counter here? > > This wouldn't be applicable in the F_SEAL_WRITE case, as the > i_mmap_writable counter would already have been decremented, and thus an > error would arise causing no further decrement, and everything would work > fine. > > It'd be very odd for something to be writable here but the driver to make > it not writable. But we do need to account for this. Yeah, it may be odd but this is indeed what i915 driver appears to be doing in i915_gem_object_mmap(): if (i915_gem_object_is_readonly(obj)) { if (vma->vm_flags & VM_WRITE) { i915_gem_object_put(obj); return -EINVAL; } vm_flags_clear(vma, VM_MAYWRITE); } > > I've checked why your v2 version broke i915 and I think the reason maybe > > has nothing to do with i915. Just in case call_mmap() failed, it ended up > > jumping to unmap_and_free_vma which calls mapping_unmap_writable() but we > > didn't call mapping_map_writable() yet so the counter became imbalanced. > > yeah that must be the cause, I thought perhaps somehow > __remove_shared_vm_struct() got invoked by i915_gem_mmap() but I didn't > trace it through to see if it was possible. > > Looking at it again, i don't think that is possible, as we hold a mmap/vma > write lock, and the only operations that can cause > __remove_shared_vm_struct() to run are things that would not be able to do > so with this lock held. > > > So I'd be for returning to v2 version, just fix up the error handling > > paths... > > So in conclusion, I agree, this is the better approach. Will respin in v4. Thanks! Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR