On Mon, Oct 11, 2021 at 10:38 AM Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > > I cleaned up this patch [1] but I realised it still doesn't solve it. > The arm64 __copy_to_user_inatomic(), while ensuring progress if called > in a loop, it does not guarantee precise copy to the fault position. That should be ok., We've always allowed the user copy to return early if it does word copies and hits a page crosser that causes a fault. Any user then has the choice of: - partial copies are bad - partial copies are handled and then you retry from the place copy_to_user() failed at and in that second case, the next time around, you'll get the fault immediately (or you'll make some more progress - maybe the user copy loop did something different just because the length and/or alignment was different). If you get the fault immediately, that's -EFAULT. And if you make some more progress, it's again up to the caller to rinse and repeat. End result: user copy functions do not have to report errors exactly. It is the caller that has to handle the situation. Most importantly: "exact error or not" doesn't actually _matter_ to the caller. If the caller does the right thing for an exact error, it will do the right thing for an inexact one too. See above. > The copy_to_sk(), after returning an error, starts again from the previous > sizeof(sh) boundary rather than from where the __copy_to_user_inatomic() > stopped. So it can get stuck attempting to copy the same search header. That seems to be purely a copy_to_sk() bug. Or rather, it looks like a bug in the caller. copy_to_sk() itself does if (copy_to_user_nofault(ubuf + *sk_offset, &sh, sizeof(sh))) { ret = 0; goto out; } and the comment says * 0: all items from this leaf copied, continue with next but that comment is then obviously not actually true in that it's not "continue with next" at all. But this is all very much a bug in the btrfs search_ioctl()/copy_to_sk() code: it simply doesn't do the proper thing for a partial result. Because no, "just retry the whole thing" is by definition not the proper thing. That said, I think that if we can have faults at non-page-aligned boundaries, then we just need to make fault_in_pages_writeable() check non-page boundaries. > An ugly fix is to fall back to byte by byte copying so that we can > attempt the actual fault address in fault_in_pages_writeable(). No, changing the user copy machinery is wrong. The caller really has to do the right thing with partial results. And I think we need to make fault_in_pages_writeable() match the actual faulting cases - maybe remote the "pages" part of the name? That would fix the btrfs code - it's not doing the right thing as-is, but it's "close enough' to right that I think fixing fault_in_pages_writeable() should fix it. No? Linus