rr, a userspace record and replay debugger[0], replays asynchronous events such as signals and context switches by essentially[1] setting a breakpoint at the address where the asynchronous event was delivered during recording with a condition that the program state matches the state when the event was delivered. Currently, rr uses software breakpoints that trap (via ptrace) to the supervisor, and evaluates the condition from the supervisor. If the asynchronous event is delivered in a tight loop (thus requiring the breakpoint condition to be repeatedly evaluated) the overhead can be immense. A patch to rr that uses hardware breakpoints via perf events with an attached BPF program to reject breakpoint hits where the condition is not satisfied reduces rr's replay overhead by 94% on a pathological (but a real customer-provided, not contrived) rr trace. The only obstacle to this approach is that while the kernel allows a BPF program to suppress sample output when a perf event overflows it does not suppress signalling the perf event fd. This appears to be a simple oversight in the code. This patch set reorders the overflow handler callback and the side effects of perf event overflow to allow an overflow handler to suppress all side effects, changes bpf_overflow_handler() to suppress those side effects if the BPF program returns zero, and adds a selftest. The previous version of this patchset can be found at https://lore.kernel.org/linux-kernel/20231204201406.341074-1-khuey@xxxxxxxxxxxx/ Changes since v1: Patch 1 was added so that a sample suppressed by this mechanism will also not generate SIGTRAPs nor count against the event limit. Patch 2 is v1's patch 1. Patch 3 is v1's patch 2, and addresses a number of review comments about the self test and adds testing for the behavior introduced by patch 1. [0] https://rr-project.org/ [1] Various optimizations exist to skip as much as execution as possible before setting a breakpoint, and to determine a set of program state that is practical to check and verify.