On Fri, Apr 21, 2023 at 11:01:26AM +0200, Jan Kara wrote: > Hi! > > On Mon 03-04-23 23:28:29, Lorenzo Stoakes wrote: > > This patch series is in two parts:- > > > > 1. Currently there are a number of places in the kernel where we assume > > VM_SHARED implies that a mapping is writable. Let's be slightly less > > strict and relax this restriction in the case that VM_MAYWRITE is not > > set. > > > > This should have no noticeable impact as the lack of VM_MAYWRITE implies > > that the mapping can not be made writable via mprotect() or any other > > means. > > > > 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap(). > > The latter already clears the VM_MAYWRITE flag for a sealed read-only > > mapping, we simply extend this to F_SEAL_WRITE too. > > > > For this to have effect, we must also invoke call_mmap() before > > mapping_map_writable(). > > > > As this is quite a fundamental change on the assumptions around VM_SHARED > > and since this causes a visible change to userland (in permitting read-only > > shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC > > to see if there is anything terribly wrong with it. > > So what I miss in this series is what the motivation is. Is it that you need > to map F_SEAL_WRITE read-only? Why? > This originated from the discussion in [1], which refers to the bug reported in [2]. Essentially the user is write-sealing a memfd then trying to mmap it read-only, but receives an -EPERM error. F_SEAL_FUTURE_WRITE _does_ explicitly permit this but F_SEAL_WRITE does not. The fcntl() man page states: Furthermore, trying to create new shared, writable memory-mappings via mmap(2) will also fail with EPERM. So the kernel does not behave as the documentation states. I took the user-supplied repro and slightly modified it, enclosed below. After this patch series, this code works correctly. I think there's definitely a case for the VM_MAYWRITE part of this patch series even if the memfd bits are not considered useful, as we do seem to make the implicit assumption that MAP_SHARED == writable even if !VM_MAYWRITE which seems odd. Reproducer:- int main() { int fd = memfd_create("test", MFD_ALLOW_SEALING); if (fd == -1) { perror("memfd_create"); return EXIT_FAILURE; } write(fd, "test", 4); if (fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE) == -1) { perror("fcntl"); return EXIT_FAILURE; } void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0); if (ret == MAP_FAILED) { perror("mmap"); return EXIT_FAILURE; } return EXIT_SUCCESS; } [1]:https://lore.kernel.org/all/20230324133646.16101dfa666f253c4715d965@xxxxxxxxxxxxxxxxxxxx/ [2]:https://bugzilla.kernel.org/show_bug.cgi?id=217238 > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR