On Tue, Mar 16, 2021 at 9:28 PM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote: > > > > On 3/16/2021 3:22 AM, Namhyung Kim wrote: > > Hi Peter and Kan, > > > > On Thu, Mar 4, 2021 at 5:22 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > >> > >> On Wed, Mar 03, 2021 at 02:53:00PM -0500, Liang, Kan wrote: > >>> On 3/3/2021 1:59 PM, Peter Zijlstra wrote: > >>>> On Wed, Mar 03, 2021 at 05:42:18AM -0800, kan.liang@xxxxxxxxxxxxxxx wrote: > >> > >>>>> +++ b/arch/x86/events/intel/ds.c > >>>>> @@ -2000,18 +2000,6 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d > >>>>> continue; > >>>>> } > >>>>> - /* > >>>>> - * On some CPUs the PEBS status can be zero when PEBS is > >>>>> - * racing with clearing of GLOBAL_STATUS. > >>>>> - * > >>>>> - * Normally we would drop that record, but in the > >>>>> - * case when there is only a single active PEBS event > >>>>> - * we can assume it's for that event. > >>>>> - */ > >>>>> - if (!pebs_status && cpuc->pebs_enabled && > >>>>> - !(cpuc->pebs_enabled & (cpuc->pebs_enabled-1))) > >>>>> - pebs_status = cpuc->pebs_enabled; > >>>> > >>>> Wouldn't something like: > >>>> > >>>> pebs_status = p->status = cpus->pebs_enabled; > >>>> > >>> > >>> I didn't consider it as a potential solution in this patch because I don't > >>> think it's a proper way that SW modifies the buffer, which is supposed to be > >>> manipulated by the HW. > >> > >> Right, but then HW was supposed to write sane values and it doesn't do > >> that either ;-) > >> > >>> It's just a personal preference. I don't see any issue here. We may try it. > >> > >> So I mostly agree with you, but I think it's a shame to unsupport such > >> chips, HSW is still a plenty useable chip today. > > > > I got a similar issue on ivybridge machines which caused kernel crash. > > My case it's related to the branch stack with PEBS events but I think > > it's the same issue. And I can confirm that the above approach of > > updating p->status fixed the problem. > > > > I've talked to Stephane about this, and he wants to make it more > > robust when we see stale (or invalid) PEBS records. I'll send the > > patch soon. > > > > Hi Namhyung, > > In case you didn't see it, I've already submitted a patch to fix the > issue last Friday. > https://lore.kernel.org/lkml/1615555298-140216-1-git-send-email-kan.liang@xxxxxxxxxxxxxxx/ > But if you have a more robust proposal, please feel free to submit it. > > BTW: The patch set from last Friday also fixed another bug found by the > perf_fuzzer test. You may be interested. Right, I missed it. It'd be nice if you could CC me for perf patches later. Thanks, Namhyung