On Tue, May 11, 2021 at 11:12 AM Axel Rasmussen <axelrasmussen@xxxxxxxxxx> wrote: > > On Mon, May 10, 2021 at 5:38 PM Robert O'Callahan <roc@xxxxxxxxx> wrote: > > > > For rr (https://rr-project.org) to support recording and replaying > > applications that use userfaultfd, we need to observe that a task we > > are controlling has blocked on a userfault. Currently this is very > > difficult to do, especially if a task blocks on a userfault on a page > > where some other task has already triggered a userfault, so no new > > userfaultfd event is generated. We also need to observe which page has > > been faulted on so we can determine when the fault has been serviced > > and the task is ready to run again. > > > > I've tried to find workarounds with existing APIs and it doesn't seem > > tractable. See https://github.com/rr-debugger/rr/issues/2852#issuecomment-837514946 > > for some thoughts about that. > > > > It seems to me that a sufficient API for us would be a new software > > perf event, e.g. PERF_COUNT_SW_USERFAULTS, with an associated > > PERF_SAMPLE_ADDR that would give us the address of the page. Does that > > sounds like a reasonable thing to add? > > Is some combination of bpf and kprobes a possible solution? There are > some seemingly relevant examples here: > https://github.com/iovisor/bpftrace/blob/master/docs/tutorial_one_liners.md > > I haven't tried it, but it seems like attaching to handle_userfault() > would give similar information to perf_count_sw_page_faults, but for > userfaults. My understanding is that using bpf/kprobes requires new permissions that are both not currently required by rr and would not be required by our proposed solution. - Kyle