On Wed, Sep 11, 2019 at 8:11 AM Peter Xu <peterx@xxxxxxxxxx> wrote: > > This is the gup counterpart of the change that allows the > VM_FAULT_RETRY to happen for more than once. One thing to mention is > that we must check the fatal signal here before retry because the GUP > can be interrupted by that, otherwise we can loop forever. I still get nervous about the signal handling here. I'm not entirely sure we get it right even before your patch series. Right now, __get_user_pages() can return -ERESTARTSYS when it's killed: /* * If we have a pending SIGKILL, don't keep faulting pages and * potentially allocating memory. */ if (fatal_signal_pending(current)) { ret = -ERESTARTSYS; goto out; } and I don't think your series changes that. And note how this is true _regardless_ of any FOLL_xyz flags (and we don't pass the FAULT_FLAG_xyz flags at all, they get generated deeper down if we actually end up faulting things in). So this part of the patch: + if (fatal_signal_pending(current)) + goto out; + *locked = 1; - lock_dropped = true; down_read(&mm->mmap_sem); ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED, - pages, NULL, NULL); + pages, NULL, locked); + if (!*locked) { + /* Continue to retry until we succeeded */ + BUG_ON(ret != 0); + goto retry; just makes me go "that can't be right". The fatal_signal_pending() is pointless and would probably better be something like if (down_read_killable(&mm->mmap_sem) < 0) goto out; and then _after_ calling __get_user_pages(), the whole "negative error handling" should be more obvious. The BUG_ON(ret != 0) makes me nervous, but it might be fine (I guess the fatal signal handling has always been done before the lock is released?). But exactly *because* __get_user_pages() can already return on fatal signals, I think it should also set FAULT_FLAG_KILLABLE when faulting things in. I don't think it does right now, so it doesn't actually necessarily check fatal signals in a timely manner (not _during_ the fault, only _before_ it). See what I'm reacting to? And maybe I'm wrong. Maybe I misread the code, and your changes. And at least some of these logic error predate your changes, I just was hoping that since this whole "react to signals" is very much what your patch series is working on, you'd look at this. Ok? Linus