On Tue, Jun 27, 2017 at 10:35 PM, Jin, Yao <yao.jin@xxxxxxxxxxxxxxx> wrote: > Hi Kyle, > > I understand your requirement. Sorry I don't know the detail of rr debugger, > but I guess if it just uses counter overflow to deliver a signal, could it > set the counter without "exclude_kernel"? Unfortunately we cannot. We depend on the counter value being exactly the same between recording and replay, and dropping `exclude_kernel` would introduce non-determinism. > Another consideration is, I'm not sure if the kernel can accurately realize > that if the interrupt is to trigger sampling or just deliver a signal. > Userspace may know that but kernel may not. After looking at this code a bit more, I think that changing the `is_sample_allowed` check from an early return to a guard around the invocation of `overflow_handler` would fix this. I believe, but have not tested, that `perf_event_fasync` is what must run to deliver our signal, while the `overflow_handler` is what copies the kernel RIP/etc into the output buffer that you want to skip. - Kyle > Anyway let's listen to more comments from community. > > Thanks > > Jin Yao > > > > On 6/28/2017 12:51 PM, Kyle Huey wrote: >> >> On Tue, Jun 27, 2017 at 7:09 PM, Jin, Yao <yao.jin@xxxxxxxxxxxxxxx> wrote: >>> >>> Hi, >>> >>> In theory, the PMI interrupts in skid region should be dropped, right? >> >> No, why would they be dropped? >> >> My understanding of the situation is as follows: >> >> There is some time, call it t_0, where the hardware counter overflows. >> The PMU triggers an interrupt, but this is not instantaneous. Call >> the time when the interrupt is actually delivered t_1. Then t_1 - t_0 >> is the "skid". >> >> Note that if the counter is `exclude_kernel`, then at t_0 the CPU >> *must* be running a userspace program. But by t_1, the CPU may be >> doing something else. Your patch changed things so that if at t_1 the >> CPU is in the kernel, then the interrupt is discarded. But rr has >> programmed the counter to deliver a signal on overflow (via F_SETSIG >> on the fd returned by perf_event_open). This change results in the >> signal never being delivered, because the interrupt was ignored. >> (More accurately, the signal is delivered the *next* time the counter >> overflows, which is far past where we wanted to inject our >> asynchronous event into our tracee. >> >> It seems to me that it might be reasonable to ignore the interrupt if >> the purpose of the interrupt is to trigger sampling of the CPUs >> register state. But if the interrupt will trigger some other >> operation, such as a signal on an fd, then there's no reason to drop >> it. >> >>> For a userspace debugger, is it the only choice that relies on the *skid* >>> PMI interrupt? >> >> I don't understand this question, but hopefully the above clarified >> things. >> >> - Kyle >> >>> Thanks >>> Jin Yao >>> >>> >>> On 6/28/2017 9:01 AM, Kyle Huey wrote: >>>> >>>> Sent again with LKML CCd, sorry for the noise. >>>> >>>> - Kyle >>>> >>>> On Tue, Jun 27, 2017 at 5:38 PM, Kyle Huey <me@xxxxxxxxxxxx> wrote: >>>>> >>>>> cc1582c231ea introduced a regression in v4.12.0-rc5, and appears to be >>>>> a candidate for backporting to stable branches. >>>>> >>>>> rr, a userspace record and replay debugger[0], uses the PMU interrupt >>>>> to stop a program during replay to inject asynchronous events such as >>>>> signals. We are counting retired conditional branches in userspace >>>>> only. This changeset causes the kernel to drop interrupts on the >>>>> floor if, during the PMU interrupt's "skid" region, the CPU enters >>>>> kernel mode for whatever reason. When replaying traces of complex >>>>> programs such as Firefox, we intermittently fail to deliver >>>>> asynchronous events on time, leading the replay to diverge from the >>>>> recorded state. >>>>> >>>>> It seems like this change should, at a bare minimum, be limited to >>>>> counters that actually perform sampling of register state when the >>>>> interrupt fires. In our case, with the retired conditional branches >>>>> counter restricted to counting userspace events only, it makes no >>>>> difference that the PMU interrupt happened to be delivered in the >>>>> kernel. >>>>> >>>>> As this makes rr unusable on complex applications and cannot be >>>>> efficiently worked around, we would appreciate this being addressed >>>>> before 4.12 is finalized, and the regression not being introduced to >>>>> stable branches. >>>>> >>>>> Thanks, >>>>> >>>>> - Kyle >>>>> >>>>> [0] http://rr-project.org/ >>> >>> >