On Tue, Aug 03, 2021 at 11:20:20AM -0400, Liang, Kan wrote: > > > On 8/3/2021 10:55 AM, Peter Zijlstra wrote: > > On Tue, Aug 03, 2021 at 06:25:28AM -0700, kan.liang@xxxxxxxxxxxxxxx wrote: > > > From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx> > > > > > > A warning as below may be occasionally triggered in an ADL machine when > > > these conditions occur, > > > - Two perf record commands run one by one. Both record a PEBS event. > > > - Both runs on small cores. > > > - They have different adaptive PEBS configuration (PEBS_DATA_CFG). > > > > > > [ 673.663291] WARNING: CPU: 4 PID: 9874 at > > > arch/x86/events/intel/ds.c:1743 > > > setup_pebs_adaptive_sample_data+0x55e/0x5b0 > > > [ 673.663348] RIP: 0010:setup_pebs_adaptive_sample_data+0x55e/0x5b0 > > > [ 673.663357] Call Trace: > > > [ 673.663357] <NMI> > > > [ 673.663357] intel_pmu_drain_pebs_icl+0x48b/0x810 > > > [ 673.663360] perf_event_nmi_handler+0x41/0x80 > > > [ 673.663368] </NMI> > > > [ 673.663370] __perf_event_task_sched_in+0x2c2/0x3a0 > > > > > > Different from the big core, the small core requires the ACK right > > > before re-enabling counters in the NMI handler, otherwise a stale PEBS > > > record may be dumped into the later NMI handler, which trigger the > > > warning. > > > > > > Add a new mid_ack flag to track the case. Add all PMI handler bits in > > > the struct x86_hybrid_pmu to track the bits for different types of PMUs. > > > Apply mid ACK for the small cores on an Alder Lake machine. > > > > Why do we need a new option? Why isn't early (as in not late) good > > enough? > > > > The early ACK can fix this issue, however it triggers a spurious NMI during > the stress test. I'm told to do the ACK right before re-enabling counters > for small cores. That indeed fixes all the issues. Any chance that would also work for the chips that now use late_ack? I'm just (desperately) trying to minimize the amount of quirks here ;-)