On Thu, Oct 21, 2021 at 04:30:30PM -1000, Linus Torvalds wrote: > On Thu, Oct 21, 2021 at 4:42 AM Andreas Gruenbacher <agruenba@xxxxxxxxxx> wrote: > > But probing the entire memory range in fault domain granularity in the > > page fault-in functions still doesn't actually make sense. Those > > functions really only need to guarantee that we'll be able to make > > progress eventually. From that point of view, it should be enough to > > probe the first byte of the requested memory range > > That's probably fine. > > Although it should be more than one byte - "copy_from_user()" might do > word-at-a-time optimizations, so you could have an infinite loop of > > (a) copy_from_user() fails because the chunk it tried to get failed partly > > (b) fault_in() probing succeeds, because the beginning part is fine > > so I agree that the fault-in code doesn't need to do the whole area, > but it needs to at least do some <N bytes, up to length> thing, to > handle the situation where the copy_to/from_user requires more than a > single byte. >From a discussion with Al some months ago, if there are bytes still accessible, copy_from_user() is not allowed to fail fully (i.e. return the requested copy size) even when it uses word-at-a-time. In the worst case, it should return size - 1. If the fault_in() then continues probing from uaddr + 1, it should eventually hit the faulty address. The problem appears when fault_in() restarts from uaddr rather than where copy_from_user() stopped. That's what the btrfs search_ioctl() does. I also need to check the direct I/O cases that Andreas mentioned, maybe they can be changed not to attempt the fault_in() from the beginning of the block. -- Catalin