Re: Userspace notifications for observing userfaultfd faults

Kyle Huey <khuey@xxxxxxxxx> · Tue, 11 May 2021 11:25:07 -0700

On Tue, May 11, 2021 at 11:12 AM Axel Rasmussen
<axelrasmussen@xxxxxxxxxx> wrote:
>
> On Mon, May 10, 2021 at 5:38 PM Robert O'Callahan <roc@xxxxxxxxx> wrote:
> >
> > For rr (https://rr-project.org) to support recording and replaying
> > applications that use userfaultfd, we need to observe that a task we
> > are controlling has blocked on a userfault. Currently this is very
> > difficult to do, especially if a task blocks on a userfault on a page
> > where some other task has already triggered a userfault, so no new
> > userfaultfd event is generated. We also need to observe which page has
> > been faulted on so we can determine when the fault has been serviced
> > and the task is ready to run again.
> >
> > I've tried to find workarounds with existing APIs and it doesn't seem
> > tractable. See https://github.com/rr-debugger/rr/issues/2852#issuecomment-837514946
> > for some thoughts about that.
> >
> > It seems to me that a sufficient API for us would be a new software
> > perf event, e.g. PERF_COUNT_SW_USERFAULTS, with an associated
> > PERF_SAMPLE_ADDR that would give us the address of the page. Does that
> > sounds like a reasonable thing to add?
>
> Is some combination of bpf and kprobes a possible solution? There are
> some seemingly relevant examples here:
> https://github.com/iovisor/bpftrace/blob/master/docs/tutorial_one_liners.md
>
> I haven't tried it, but it seems like attaching to handle_userfault()
> would give similar information to perf_count_sw_page_faults, but for
> userfaults.

My understanding is that using bpf/kprobes requires new permissions
that are both not currently required by rr and would not be required
by our proposed solution.

- Kyle