On Tue, Mar 16, 2021 at 5:28 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote: > > > > On 3/16/2021 3:22 AM, Namhyung Kim wrote: > > Hi Peter and Kan, > > > > On Thu, Mar 4, 2021 at 5:22 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > >> > >> On Wed, Mar 03, 2021 at 02:53:00PM -0500, Liang, Kan wrote: > >>> On 3/3/2021 1:59 PM, Peter Zijlstra wrote: > >>>> On Wed, Mar 03, 2021 at 05:42:18AM -0800, kan.liang@xxxxxxxxxxxxxxx wrote: > >> > >>>>> +++ b/arch/x86/events/intel/ds.c > >>>>> @@ -2000,18 +2000,6 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d > >>>>> continue; > >>>>> } > >>>>> - /* > >>>>> - * On some CPUs the PEBS status can be zero when PEBS is > >>>>> - * racing with clearing of GLOBAL_STATUS. > >>>>> - * > >>>>> - * Normally we would drop that record, but in the > >>>>> - * case when there is only a single active PEBS event > >>>>> - * we can assume it's for that event. > >>>>> - */ > >>>>> - if (!pebs_status && cpuc->pebs_enabled && > >>>>> - !(cpuc->pebs_enabled & (cpuc->pebs_enabled-1))) > >>>>> - pebs_status = cpuc->pebs_enabled; > >>>> > >>>> Wouldn't something like: > >>>> > >>>> pebs_status = p->status = cpus->pebs_enabled; > >>>> > >>> > >>> I didn't consider it as a potential solution in this patch because I don't > >>> think it's a proper way that SW modifies the buffer, which is supposed to be > >>> manipulated by the HW. > >> > >> Right, but then HW was supposed to write sane values and it doesn't do > >> that either ;-) > >> > >>> It's just a personal preference. I don't see any issue here. We may try it. > >> > >> So I mostly agree with you, but I think it's a shame to unsupport such > >> chips, HSW is still a plenty useable chip today. > > > > I got a similar issue on ivybridge machines which caused kernel crash. > > My case it's related to the branch stack with PEBS events but I think > > it's the same issue. And I can confirm that the above approach of > > updating p->status fixed the problem. > > > > I've talked to Stephane about this, and he wants to make it more > > robust when we see stale (or invalid) PEBS records. I'll send the > > patch soon. > > > > Hi Namhyung, > > In case you didn't see it, I've already submitted a patch to fix the > issue last Friday. > https://lore.kernel.org/lkml/1615555298-140216-1-git-send-email-kan.liang@xxxxxxxxxxxxxxx/ > But if you have a more robust proposal, please feel free to submit it. > This fixes the problem on the older systems. The other problem we identified related to the PEBS sample processing code is that you can end up with uninitialized perf_sample_data struct passed to perf_event_overflow(): setup_pebs_fixed_sample_data(pebs, data) { if (!pebs) return; perf_sample_data_init(data); <<< must be moved before the if (!pebs) ... } __intel_pmu_pebs_event(pebs, data) { setup_sample(pebs, data) perf_event_overflow(data); ... } If there is any other reason to get a pebs = NULL in fix_sample_data() or adaptive_sample_data(), then you must call perf_sample_data_init(data) BEFORE you return otherwise you end up in perf_event_overflow() with uninitialized data and you may die as follows: [<ffffffff812f283d>] ? perf_output_copy+0x4d/0xb0 [<ffffffff812e0fb1>] perf_output_sample+0x561/0xab0 [<ffffffff812e0952>] ? __perf_event_header__init_id+0x112/0x130 [<ffffffff812e1be1>] ? perf_prepare_sample+0x1b1/0x730 [<ffffffff812e21b9>] perf_event_output_forward+0x59/0x80 [<ffffffff812e0634>] ? perf_event_update_userpage+0xf4/0x110 [<ffffffff812e4468>] perf_event_overflow+0x88/0xe0 [<ffffffff810175b8>] __intel_pmu_pebs_event+0x328/0x380 This all stems from get_next_pebs_record_by_bit() potentially returning NULL but the NULL is not handled correctly by the callers. This is what I'd like to see cleaned up in __intel_pmu_pebs_event() to avoid future problems. I have a patch that moves the perf_sample_data_init() and I can send it to LKML but it would also need the cleanup for get_next_pebs_record_by_bit() to be complete. Thanks.