On Fri, Mar 24, 2023 at 01:36:46PM -0700, Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Fri, 24 Mar 2023 03:34:23 +0000 bugzilla-daemon@xxxxxxxxxx wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=217238 > > > > Bug ID: 217238 > > Summary: Creating shared read-only map is denied after add > > write seal to a memfd > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 6.2.8 > > Hardware: All > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Other > > Assignee: akpm@xxxxxxxxxxxxxxxxxxxx > > Reporter: yshuiv7@xxxxxxxxx > > Regression: No > > > > Test case: > > > > int main() { > > int fd = memfd_create("test", MFD_ALLOW_SEALING); > > write(fd, "test", 4); > > fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE); > > > > void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0); > > } > > > > This fails with EPERM. This is in contradiction with what's described in the > > documentation of F_SEAL_WRITE. > > > > -- > > You may reply to this email to add a comment. > > > > You are receiving this mail because: > > You are the assignee for the bug. > This issue seems to be the result of the use of the memfd's shmem region's page cache object (struct address_space)'s i_mmap_writable field to denote whether it is write-sealed. The kernel assumes that a VM_SHARED mapping might become writable at any time via mprotect(), therefore treats VM_SHARED mappings as if they were writable as far as i_mmap_writable is concerned (this field's primary use is to determine whether, for architectures that require it, flushing must occur if this is set to avoid aliasing, see filemap_read() for example). In theory we could convert all such checks to VM_SHARED | VM_WRITE (importantly including on fork) and then update mprotect() to check mapping_map_writable() if a user tries to make unwritable memory writable. I suspect however there are reasons relating to locking that make it unreasonable to try to do this, but I may be mistaken (others might have some insight on this). I also see some complexity around this in the security checks on marking shared writable mappings executable (e.g. in mmap_violation_check()). In any case, it doesn't really make much sense to have a write-sealed shared mapping, since you're essentially saying 'nothing _at all_ can write to this' so it may as well be private. The semantics are unfortunate here, the memory will still be shared read-only by MAP_PRIVATE mappings. A better choice here might be F_SEAL_FUTURE_WRITE (available from kernel >=5.1) which does permit shared read-only mappings as this is explicitly checked for in seal_check_future_write() invoked from shmem_mmap(). Regardless, I think the conclusion is that this is not a bug, but rather that the documentation needs to be updated.