On Wed, Jun 17, 2020 at 01:15:31PM -0700, Linus Torvalds wrote: > On Wed, Jun 17, 2020 at 12:58 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > But currently remote GUP will still do the page fault accounting on the remote > > task_struct, am I right? E.g., when the get_user_pages_remote() is called with > > "tsk != current", it seems the faultin_page() will still do maj_flt/min_flt > > accounting for that remote task/thread? > > Well, that would be a data race and fundamentally buggy. > > It would be ok with something like ptrace (which only works when the > target is quiescent), but is completely wrong otherwise. > > I guess it works fine in practice, and it's only statistics so even if > you were to have a data race it doesn't much matter, but it's > definitely conceptually very very wrong. > > The fault stats should be about who does the fault (they are about the > _thread_) not about who the fault is done to (which is about the > _mm_). > > Allocating the fault data to somebody else sounds frankly silly and > stupid to me, exactly because it's (a) racy and (b) not even > conceptually correct. The other thread literally _isn't_ doing a major > page fault, for crissake! > > Now, there are some actual per-mm statistics too (the rss stuff etc), > and it's fundamentally harder exactly because of the shared data. See > the mm_counter stuff etc. Those are not about who does soemthing, they > are about the resulting MM state. I see. How about I move another small step further to cover GUP (to be explicit, faultin_page() and fixup_user_fault())? GUP needs the per-task accounting, but not the perf events. We can do that by slightly changing the new approach into: bool major = (ret & VM_FAULT_MAJOR) || (flags & FAULT_FLAG_TRIED); if (major) current->maj_flt++; else current->min_flt++; if (!regs) return ret; if (major) perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address); else perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address); With the change we will always account that onto current task. Since there're GUP calls that are not even accounted at all, e.g., get_user_pages_remote() with tsk==NULL: for these calls, we also account these page faults onto current task. Another major benefit I can see (besides a completely cleaned up accounting) is that we can remove the task_struct pointer in the whole GUP code, because AFAICT that's only used for this pf accounting either in faultin_page() or fixup_user_fault(), which seems questionable itself now to not use current... Any opinions? Thanks, -- Peter Xu