On 8/3/2022 5:39 am, Jim Mattson wrote:
On Sun, Mar 6, 2022 at 10:38 PM Like Xu <like.xu.linux@xxxxxxxxx> wrote:
From: Like Xu <likexu@xxxxxxxxxxx>
HSW_IN_TX* bits are used in generic code which are not supported on
AMD. Worse, these bits overlap with AMD EventSelect[11:8] and hence
using HSW_IN_TX* bits unconditionally in generic code is resulting in
unintentional pmu behavior on AMD. For example, if EventSelect[11:8]
is 0x2, pmc_reprogram_counter() wrongly assumes that
HSW_IN_TX_CHECKPOINTED is set and thus forces sampling period to be 0.
Opportunistically remove two TSX specific incoming parameters for
the generic interface reprogram_counter().
Fixes: 103af0a98788 ("perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5")
Co-developed-by: Ravi Bangoria <ravi.bangoria@xxxxxxx>
Signed-off-by: Ravi Bangoria <ravi.bangoria@xxxxxxx>
Signed-off-by: Like Xu <likexu@xxxxxxxxxxx>
---
Note: this patch is based on [1] which is considered to be a necessary cornerstone.
[1] https://lore.kernel.org/kvm/20220302111334.12689-1-likexu@xxxxxxxxxxx/
arch/x86/kvm/pmu.c | 29 ++++++++++++++---------------
1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 17c61c990282..d0f9515c37dd 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -99,8 +99,7 @@ static void kvm_perf_overflow(struct perf_event *perf_event,
static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
u64 config, bool exclude_user,
- bool exclude_kernel, bool intr,
- bool in_tx, bool in_tx_cp)
+ bool exclude_kernel, bool intr)
{
struct perf_event *event;
struct perf_event_attr attr = {
@@ -116,16 +115,18 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
attr.sample_period = get_sample_period(pmc, pmc->counter);
- if (in_tx)
- attr.config |= HSW_IN_TX;
- if (in_tx_cp) {
- /*
- * HSW_IN_TX_CHECKPOINTED is not supported with nonzero
- * period. Just clear the sample period so at least
- * allocating the counter doesn't fail.
- */
- attr.sample_period = 0;
- attr.config |= HSW_IN_TX_CHECKPOINTED;
+ if (guest_cpuid_is_intel(pmc->vcpu)) {
This is not the right condition to check. Per the SDM, both bits 32
and 33 "may only be set if the processor supports HLE or RTM." On
other Intel processors, this bit is reserved and any attempts to set
them result in a #GP.
We already have this part of the code:
entry = kvm_find_cpuid_entry(vcpu, 7, 0);
if (entry &&
(boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) &&
(entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM)))
pmu->reserved_bits ^= HSW_IN_TX|HSW_IN_TX_CHECKPOINTED;
+ if (pmc->eventsel & HSW_IN_TX)
+ attr.config |= HSW_IN_TX;
This statement does nothing. If HSW_IN_TX is set in pmc->eventsel, it
is set in attr.config already.
Agree for the redundancy, since attr.config is "(eventsel & AMD64_RAW_EVENT_MASK)".
+ if (pmc->eventsel & HSW_IN_TX_CHECKPOINTED) {
+ /*
+ * HSW_IN_TX_CHECKPOINTED is not supported with nonzero
+ * period. Just clear the sample period so at least
+ * allocating the counter doesn't fail.
+ */
+ attr.sample_period = 0;
+ attr.config |= HSW_IN_TX_CHECKPOINTED;
As above, this statement does nothing. We should just set
attr.sample_period to 0. Note, however, that the SDM documents an
Thanks and applied.
additional constraint which is ignored here: "This bit may only be set
for IA32_PERFEVTSEL2." I have confirmed that a #GP is raised for an
attempt to set bit 33 in any PerfEvtSeln other than PerfEvtSel2 on a
Broadwell Xeon E5.
Yes, "19.3.6.5 Performance Monitoring and Intel® TSX".
I'm not sure if the host perf scheduler indicate this restriction.
cc Kan.
+ }
}
event = perf_event_create_kernel_counter(&attr, -1, current,
@@ -268,9 +269,7 @@ void reprogram_counter(struct kvm_pmc *pmc)
(eventsel & AMD64_RAW_EVENT_MASK),
!(eventsel & ARCH_PERFMON_EVENTSEL_USR),
!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
- eventsel & ARCH_PERFMON_EVENTSEL_INT,
- (eventsel & HSW_IN_TX),
- (eventsel & HSW_IN_TX_CHECKPOINTED));
+ eventsel & ARCH_PERFMON_EVENTSEL_INT);
}
EXPORT_SYMBOL_GPL(reprogram_counter);
--
2.35.1