Re: [PATCH v4 3/3] GHES: Add GHES NMI nice level support

"Rafael J. Wysocki" <rjw@xxxxxxxxxxxxx> · Wed, 03 Oct 2018 11:37:15 +0200

On Wednesday, September 5, 2018 10:28:19 AM CEST Qiuxu Zhuo wrote:
> Current NMI mechanism is to process all the handlers for each NMI.
> Because perf uses NMI, so GHES NMI handler runs unnecessarily for
> every perf NMI handling. This will be captured by PMU's PEBS (Precise
> Event Based Sampling) and disturb perf result.
> 
> GHES NMIs are very rare because they are only used in extreme error
> situations or very frequent when machine is dying and error floods
> happen. So add a GHES NMI nice level via GHES platform device sysfs
> if it's > 0 and any other NMI (e.g. PMU NMI) has been handled for
> current NMI, then skip current GHES NMI handler. So next PMU NMI can
> be processed early and perf result is not distrubed by GHES NMI handler.
> 
> We reply statistically on the property that GHES NMIs are unlikely
> to collide with perf NMIs, or they are frequent there will be enough
> of them that it doesn't matter. It's a heuristics that is not 100%
> correct, but a reasonable one, and it saves a lot of unnecessary
> work for every NMI.
> 
> Test machines have HEST ACPI table installed and NMI notification
> set, test cmds are 'perf mem record -a sleep 1' and 'perf mem report'.
> 
> Before applying patch (perf memory profile):
> 
> On Intel Broadwell-4S:
> 0.63%  1  17910  LFB or LFB hit  [k] intel_pstate_update_util
> 0.59%  1  16960  LFB or LFB hit  [k] intel_pstate_update_util
> ...
> 0.30%  1  8722   L1 or L1 hit    [k] ghes_notify_nmi
> 
> On Intel Skylake-4S:
> 3.45%  1  20218  L1 hit          [k] native_read_msr
> 1.21%  1  7078   LFB hit         [k] intel_pstate_update_util
> ...
> 1.21%  1  7077   N/A miss        [k] ghes_notify_nmi
> 
> After applying patch and 'echo 1 > /sys/devices/platform/GHES.[0-9]*/nmi_nice':
> No GHES was showed up in perf memory profile.
> 
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>
> Suggested-by: Ying Huang <ying.huang@xxxxxxxxx>
> Reported-by: Andi Kleen <andi.kleen@xxxxxxxxx>

Unless this has been applied already, can you CC the entire series to
linux-acpi, please?

Thanks,
Rafael