On Thu, Nov 4, 2021 at 7:22 PM Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > On Tue, Nov 02, 2021 at 01:29:32PM +0100, Andreas Gruenbacher wrote: > > Turn iov_iter_fault_in_readable into a function that returns the number > > of bytes not faulted in, similar to copy_to_user, instead of returning a > > non-zero value when any of the requested pages couldn't be faulted in. > > This supports the existing users that require all pages to be faulted in > > as well as new users that are happy if any pages can be faulted in. > > > > Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make > > sure this change doesn't silently break things. > > > > Signed-off-by: Andreas Gruenbacher <agruenba@xxxxxxxxxx> > [...] > > diff --git a/mm/filemap.c b/mm/filemap.c > > index ff34f4087f87..4dd5edcd39fd 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -3757,7 +3757,7 @@ ssize_t generic_perform_write(struct file *file, > > * same page as we're writing to, without it being marked > > * up-to-date. > > */ > > - if (unlikely(iov_iter_fault_in_readable(i, bytes))) { > > + if (unlikely(fault_in_iov_iter_readable(i, bytes))) { > > status = -EFAULT; > > break; > > } > > Now that fault_in_iov_iter_readable() returns the number of bytes, we > could change the above test to: > > if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) { > > Assuming we have a pointer 'a', accessible, and 'a + PAGE_SIZE' unmapped: > > write(fd, a + PAGE_SIZE - 1, 2); > > can still copy one byte but it returns -EFAULT instead since the second > page is not accessible. > > While writing some test-cases for MTE (sub-page faults, 16-byte > granularity), we noticed that reading 2 bytes from 'a + 15' with > 'a + 16' tagged for faulting: > > write(fd, a + 15, 2); > > succeeds as long as 'a + 16' is not at a page boundary. Checking against > 'bytes' above makes this consistent. > > The downside is that it's an ABI change though not sure anyone is > relying on it. The same pattern exists in iomap_write_iter too, of course. In the very light testing I did for eliminating the pre-faulting, this kind of change was working fine. I have no performance numbers though. https://lore.kernel.org/linux-fsdevel/20211026094430.3669156-1-agruenba@xxxxxxxxxx/ https://lore.kernel.org/linux-fsdevel/20211027212138.3722977-1-agruenba@xxxxxxxxxx/ Thanks, Andreas