Hello, Hugh, Axel, Sorry for a very late reply after a long PTO.. On Thu, Jul 13, 2023 at 07:40:30PM -0700, Hugh Dickins wrote: > On Wed, 12 Jul 2023, Axel Rasmussen wrote: > > On Tue, Jul 11, 2023 at 6:27 PM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > > > > > Smatch has observed that pte_offset_map_lock() is now allowed to fail, > > > and then ptl should not be unlocked. Use -EAGAIN here like elsewhere. > > > > > > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > > > > Looks correct to me, thanks Hugh! If it's useful, feel free to take: > > > > Reviewed-by: Axel Rasmussen <axelrasmussen@xxxxxxxxxx> > > Thanks, that will have helped Andrew (but would vanish into the merge > with your original when it moves from mm-unstable to mm-stable). > > > > > > --- > > > Axel, Peter: this seems right as a fix to the patch in mm-unstable; > > > but in preparing this, I noticed mfill_atomic()'s code before calling > > > mfill_atomic_pte(), and think that my original choice of -EFAULT was > > > therefore better than -EAGAIN for all of these; and that mfill_atomic()'s > > > BUG_ONs there would be better deleted (and is its BUG_ON(folio) safe??). > > > Something one of us should address, after this fixup is in akpm's tree. > > > > Agreed, dealing with the BUG_ONs is something I have been meaning to > > do as checkpatch bothers me about it frequently. > > > > What this code is trying to do is to enforce the contract that: if > > mfill_atomic_pte failed, then it must have released *folio and set it > > to NULL. But, you're exactly right that if we had such a bug, it would > > mean we leaked one page in an unusual error case - not great, but not > > worth crashing the kernel either. Yes I also agree the BUG_ON might be a bit too aggressive, even if still accurate (so we should really not trigger that). WARN_ON_ONCE() should be enough. > > > > For UFFDIO_POISON this doesn't really apply because it doesn't > > allocate or get a reference to a page in any case. > > > > For other cases like UFFDIO_COPY where it does matter, I think it's > > fine as is regardless of the error code returned (as long as it's not > > -ENOENT :) ). mfill_atomic_pte_copy() is sure to set `*foliop = NULL` > > before it attempts this lock (in mfill_atomic_install_pte). So -EAGAIN > > or -EFAULT should work equally well at least from that perspective. I > > somewhat prefer -EAGAIN, but maybe I'm missing something. > > Right, when I said -EFAULT better than -EAGAIN, I was only basing that > on -EFAULT being the code which was already used for pmd_trans_huge() > in the check above mfill_atomic()'s call to mfill_atomic_pte(). > > I don't disagree with you, and Peter too preferred -EAGAIN: it's just > better if they all use the same code (or maybe mfill_atomic()'s first > check for pmd_trans_huge() can be deleted along with the BUG_ONs - but > I have not checked through the cases, there may be very good reason to > keep that preliminary check there). It's just that -EFAULT will definitely fail most of the uffd based apps, because no app can handle an -EFAULT over any ioctl(UFFDIO_*). So if we'd delete the 1st pmd_trans_huge() check we may want to make the 2nd return -EAGAIN. I think this -EFAULT (of 2nd pmd_trans_huge() check) should really also be an -EAGAIN, I'd bet ten dollars if it's really hit somewhere before it'll crash the app as explained. With Hugh's recent rework on pte_offset_map_lock() (which should contain a thp check within the map_lock()) IMHO we can safely drop the 2nd check as a whole (while keeping the 1st check as a fast path to detect existing thps), and then we should reliably return an -EAGAIN if a thp raced with us, so the userapp should just retry when race with anything else. Thanks, -- Peter Xu