You can add a skew between cores in qemu, something like this: case CSR_INSTRET: core_id()*return cpu_get_host_ticks()/10; break; case CSR_CYCLE: return cpu_get_host_ticks(); break; Alex On Wed, Mar 28, 2018 at 7:30 PM, Alan Kao <alankao@xxxxxxxxxxxxx> wrote: > Hi Alex, > > I'm appreciated for your reply and tests. > > On Wed, Mar 28, 2018 at 03:58:41PM -0700, Alex Solomatnikov wrote: >> Did you test this code? > > I did test this patch on QEMU's virt model with multi-hart, which is the only > RISC-V machine I have for now. But as I mentioned in > https://github.com/riscv/riscv-qemu/pull/115 , the hardware counter support > in QEMU is not fully conformed to the 1.10 Priv-Spec, so I had to slightly > tweak the code to make reading work. > > Specifically, the read to cycle and instret in QEMU looks like this: > ... > case CSR_INSTRET: > case CSR_CYCLE: > // if (ctr_ok) { > return cpu_get_host_ticks(); > // } > break; > ... > and the two lines of comment was the tweak. > > On such environment, I did not get anything unexpected. No matter which of them > is requested, QEMU returns the host's tick. > >> >> I got funny numbers when I tried to run it on HiFive Unleashed: >> >> perf stat mem-latency >> ... >> >> Performance counter stats for 'mem-latency': >> >> 157.907000 task-clock (msec) # 0.940 CPUs utilized >> >> 1 context-switches # 0.006 K/sec >> >> 1 cpu-migrations # 0.006 K/sec >> >> 4102 page-faults # 0.026 M/sec >> >> 157923752 cycles # 1.000 GHz >> >> 9223372034948899840 instructions # 58403957087.78 insn >> per cycle >> <not supported> branches >> >> <not supported> branch-misses >> >> >> 0.168046000 seconds time elapsed >> >> >> Tracing read_counter(), I see this: >> >> Jan 1 00:41:50 buildroot user.info kernel: [ 2510.058809] CPU 3: >> read_counter idx=0 val=2528358954912 >> Jan 1 00:41:50 buildroot user.info kernel: [ 2510.063339] CPU 3: >> read_counter idx=1 val=53892244920 >> Jan 1 00:41:50 buildroot user.info kernel: [ 2510.118160] CPU 3: >> read_counter idx=0 val=2528418303035 >> Jan 1 00:41:50 buildroot user.info kernel: [ 2510.122694] CPU 3: >> read_counter idx=1 val=53906699665 >> Jan 1 00:41:50 buildroot user.info kernel: [ 2510.216736] CPU 1: >> read_counter idx=0 val=2528516878664 >> Jan 1 00:41:50 buildroot user.info kernel: [ 2510.221270] CPU 1: >> read_counter idx=1 val=51986369142 >> >> It looks like the counter values from different cores are subtracted and >> wraparound occurs. >> > > Thanks for the hint. It makes sense. 9223372034948899840 is 7fffffff8e66a400, > which should be a wraparound with the mask I set (63-bit) in the code. > > I will try this direction. Ideally, we can solve it by explicitly syncing the > hwc->prev_count when a cpu migration event happens. > >> >> Also, core IDs and socket IDs are wrong in perf report: >> > > As Palmer has replied to this, I have no comment here. > >> perf report --header -I >> Error: >> The perf.data file has no samples! >> # ======== >> # captured on: Thu Jan 1 02:52:07 1970 >> # hostname : buildroot >> # os release : 4.15.0-00045-g0d7c030-dirty >> # perf version : 4.15.0 >> # arch : riscv64 >> # nrcpus online : 4 >> # nrcpus avail : 5 >> # total memory : 8188340 kB >> # cmdline : /usr/bin/perf record -F 1000 lat_mem_rd -P 1 -W 1 -N 1 -t 10 >> # event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } = >> 1000, sample_type = IP|TID|TIME|PERIOD, disabled = 1, inherit = 1, mmap = >> 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3, >> sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1 >> # sibling cores : 1 >> # sibling cores : 2 >> # sibling cores : 3 >> # sibling cores : 4 >> # sibling threads : 1 >> # sibling threads : 2 >> # sibling threads : 3 >> # sibling threads : 4 >> # CPU 0: Core ID -1, Socket ID -1 >> # CPU 1: Core ID 0, Socket ID -1 >> # CPU 2: Core ID 0, Socket ID -1 >> # CPU 3: Core ID 0, Socket ID -1 >> # CPU 4: Core ID 0, Socket ID -1 >> # pmu mappings: cpu = 4, software = 1 >> # CPU cache info: >> # L1 Instruction 32K [1] >> # L1 Data 32K [1] >> # L1 Instruction 32K [2] >> # L1 Data 32K [2] >> # L1 Instruction 32K [3] >> # L1 Data 32K [3] >> # missing features: TRACING_DATA BUILD_ID CPUDESC CPUID NUMA_TOPOLOGY >> BRANCH_STACK GROUP_DESC AUXTRACE STAT >> # ======== >> >> >> Alex >> > > Many thanks, > Alan > >> On Mon, Mar 26, 2018 at 12:57 AM, Alan Kao <alankao@xxxxxxxxxxxxx> wrote: >> >> > This patch provide a basic PMU, riscv_base_pmu, which supports two >> > general hardware event, instructions and cycles. Furthermore, this >> > PMU serves as a reference implementation to ease the portings in >> > the future. >> > >> > riscv_base_pmu should be able to run on any RISC-V machine that >> > conforms to the Priv-Spec. Note that the latest qemu model hasn't >> > fully support a proper behavior of Priv-Spec 1.10 yet, but work >> > around should be easy with very small fixes. Please check >> > https://github.com/riscv/riscv-qemu/pull/115 for future updates. >> > >> > Cc: Nick Hu <nickhu@xxxxxxxxxxxxx> >> > Cc: Greentime Hu <greentime@xxxxxxxxxxxxx> >> > Signed-off-by: Alan Kao <alankao@xxxxxxxxxxxxx> >> > --- >> > arch/riscv/Kconfig | 12 + >> > arch/riscv/include/asm/perf_event.h | 76 +++++- >> > arch/riscv/kernel/Makefile | 1 + >> > arch/riscv/kernel/perf_event.c | 469 ++++++++++++++++++++++++++++++ >> > ++++++ >> > 4 files changed, 554 insertions(+), 4 deletions(-) >> > create mode 100644 arch/riscv/kernel/perf_event.c >> > >> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig >> > index 310b9a5d6737..dd4aecfb5265 100644 >> > --- a/arch/riscv/Kconfig >> > +++ b/arch/riscv/Kconfig >> > @@ -195,6 +195,18 @@ config RISCV_ISA_C >> > config RISCV_ISA_A >> > def_bool y >> > >> > +menu "PMU type" >> > + depends on PERF_EVENTS >> > + >> > +config RISCV_BASE_PMU >> > + bool "Base Performance Monitoring Unit" >> > + def_bool y >> > + help >> > + A base PMU that serves as a reference implementation and has >> > limited >> > + feature of perf. >> > + >> > +endmenu >> > + >> > endmenu >> > >> > menu "Kernel type" >> > diff --git a/arch/riscv/include/asm/perf_event.h >> > b/arch/riscv/include/asm/perf_event.h >> > index e13d2ff29e83..98e2efb02d25 100644 >> > --- a/arch/riscv/include/asm/perf_event.h >> > +++ b/arch/riscv/include/asm/perf_event.h >> > @@ -1,13 +1,81 @@ >> > +/* SPDX-License-Identifier: GPL-2.0 */ >> > /* >> > * Copyright (C) 2018 SiFive >> > + * Copyright (C) 2018 Andes Technology Corporation >> > * >> > - * This program is free software; you can redistribute it and/or >> > - * modify it under the terms of the GNU General Public Licence >> > - * as published by the Free Software Foundation; either version >> > - * 2 of the Licence, or (at your option) any later version. >> > */ >> > >> > #ifndef _ASM_RISCV_PERF_EVENT_H >> > #define _ASM_RISCV_PERF_EVENT_H >> > >> > +#include <linux/perf_event.h> >> > +#include <linux/ptrace.h> >> > + >> > +#define RISCV_BASE_COUNTERS 2 >> > + >> > +/* >> > + * The RISCV_MAX_COUNTERS parameter should be specified. >> > + */ >> > + >> > +#ifdef CONFIG_RISCV_BASE_PMU >> > +#define RISCV_MAX_COUNTERS 2 >> > +#endif >> > + >> > +#ifndef RISCV_MAX_COUNTERS >> > +#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU." >> > +#endif >> > + >> > +/* >> > + * These are the indexes of bits in counteren register *minus* 1, >> > + * except for cycle. It would be coherent if it can directly mapped >> > + * to counteren bit definition, but there is a *time* register at >> > + * counteren[1]. Per-cpu structure is scarce resource here. >> > + * >> > + * According to the spec, an implementation can support counter up to >> > + * mhpmcounter31, but many high-end processors has at most 6 general >> > + * PMCs, we give the definition to MHPMCOUNTER8 here. >> > + */ >> > +#define RISCV_PMU_CYCLE 0 >> > +#define RISCV_PMU_INSTRET 1 >> > +#define RISCV_PMU_MHPMCOUNTER3 2 >> > +#define RISCV_PMU_MHPMCOUNTER4 3 >> > +#define RISCV_PMU_MHPMCOUNTER5 4 >> > +#define RISCV_PMU_MHPMCOUNTER6 5 >> > +#define RISCV_PMU_MHPMCOUNTER7 6 >> > +#define RISCV_PMU_MHPMCOUNTER8 7 >> > + >> > +#define RISCV_OP_UNSUPP (-EOPNOTSUPP) >> > + >> > +struct cpu_hw_events { >> > + /* # currently enabled events*/ >> > + int n_events; >> > + /* currently enabled events */ >> > + struct perf_event *events[RISCV_MAX_COUNTERS]; >> > + /* vendor-defined PMU data */ >> > + void *platform; >> > +}; >> > + >> > +struct riscv_pmu { >> > + struct pmu *pmu; >> > + >> > + /* generic hw/cache events table */ >> > + const int *hw_events; >> > + const int (*cache_events)[PERF_COUNT_HW_CACHE_MAX] >> > + [PERF_COUNT_HW_CACHE_OP_MAX] >> > + [PERF_COUNT_HW_CACHE_RESULT_MAX]; >> > + /* method used to map hw/cache events */ >> > + int (*map_hw_event)(u64 config); >> > + int (*map_cache_event)(u64 config); >> > + >> > + /* max generic hw events in map */ >> > + int max_events; >> > + /* number total counters, 2(base) + x(general) */ >> > + int num_counters; >> > + /* the width of the counter */ >> > + int counter_width; >> > + >> > + /* vendor-defined PMU features */ >> > + void *platform; >> > +}; >> > + >> > #endif /* _ASM_RISCV_PERF_EVENT_H */ >> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile >> > index 196f62ffc428..849c38d9105f 100644 >> > --- a/arch/riscv/kernel/Makefile >> > +++ b/arch/riscv/kernel/Makefile >> > @@ -36,5 +36,6 @@ obj-$(CONFIG_SMP) += smp.o >> > obj-$(CONFIG_MODULES) += module.o >> > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o >> > obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o >> > +obj-$(CONFIG_PERF_EVENTS) += perf_event.o >> > >> > clean: >> > diff --git a/arch/riscv/kernel/perf_event.c b/arch/riscv/kernel/perf_ >> > event.c >> > new file mode 100644 >> > index 000000000000..b78cb486683b >> > --- /dev/null >> > +++ b/arch/riscv/kernel/perf_event.c >> > @@ -0,0 +1,469 @@ >> > +/* SPDX-License-Identifier: GPL-2.0 */ >> > +/* >> > + * Copyright (C) 2008 Thomas Gleixner <tglx@xxxxxxxxxxxxx> >> > + * Copyright (C) 2008-2009 Red Hat, Inc., Ingo Molnar >> > + * Copyright (C) 2009 Jaswinder Singh Rajput >> > + * Copyright (C) 2009 Advanced Micro Devices, Inc., Robert Richter >> > + * Copyright (C) 2008-2009 Red Hat, Inc., Peter Zijlstra >> > + * Copyright (C) 2009 Intel Corporation, <markus.t.metzger@xxxxxxxxx> >> > + * Copyright (C) 2009 Google, Inc., Stephane Eranian >> > + * Copyright 2014 Tilera Corporation. All Rights Reserved. >> > + * Copyright (C) 2018 Andes Technology Corporation >> > + * >> > + * Perf_events support for RISC-V platforms. >> > + * >> > + * Since the spec. (as of now, Priv-Spec 1.10) does not provide enough >> > + * functionality for perf event to fully work, this file provides >> > + * the very basic framework only. >> > + * >> > + * For platform portings, please check Documentations/riscv/pmu.txt. >> > + * >> > + * The Copyright line includes x86 and tile ones. >> > + */ >> > + >> > +#include <linux/kprobes.h> >> > +#include <linux/kernel.h> >> > +#include <linux/kdebug.h> >> > +#include <linux/mutex.h> >> > +#include <linux/bitmap.h> >> > +#include <linux/irq.h> >> > +#include <linux/interrupt.h> >> > +#include <linux/perf_event.h> >> > +#include <linux/atomic.h> >> > +#include <asm/perf_event.h> >> > + >> > +static const struct riscv_pmu *riscv_pmu __read_mostly; >> > +static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events); >> > + >> > +/* >> > + * Hardware & cache maps and their methods >> > + */ >> > + >> > +static const int riscv_hw_event_map[] = { >> > + [PERF_COUNT_HW_CPU_CYCLES] = RISCV_PMU_CYCLE, >> > + [PERF_COUNT_HW_INSTRUCTIONS] = RISCV_PMU_INSTRET, >> > + [PERF_COUNT_HW_CACHE_REFERENCES] = RISCV_OP_UNSUPP, >> > + [PERF_COUNT_HW_CACHE_MISSES] = RISCV_OP_UNSUPP, >> > + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = RISCV_OP_UNSUPP, >> > + [PERF_COUNT_HW_BRANCH_MISSES] = RISCV_OP_UNSUPP, >> > + [PERF_COUNT_HW_BUS_CYCLES] = RISCV_OP_UNSUPP, >> > +}; >> > + >> > +#define C(x) PERF_COUNT_HW_CACHE_##x >> > +static const int riscv_cache_event_map[PERF_COUNT_HW_CACHE_MAX] >> > +[PERF_COUNT_HW_CACHE_OP_MAX] >> > +[PERF_COUNT_HW_CACHE_RESULT_MAX] = { >> > + [C(L1D)] = { >> > + [C(OP_READ)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + [C(OP_WRITE)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + [C(OP_PREFETCH)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + }, >> > + [C(L1I)] = { >> > + [C(OP_READ)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + [C(OP_WRITE)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + [C(OP_PREFETCH)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + }, >> > + [C(LL)] = { >> > + [C(OP_READ)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + [C(OP_WRITE)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + [C(OP_PREFETCH)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + }, >> > + [C(DTLB)] = { >> > + [C(OP_READ)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + [C(OP_WRITE)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + [C(OP_PREFETCH)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + }, >> > + [C(ITLB)] = { >> > + [C(OP_READ)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + [C(OP_WRITE)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + [C(OP_PREFETCH)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + }, >> > + [C(BPU)] = { >> > + [C(OP_READ)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + [C(OP_WRITE)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + [C(OP_PREFETCH)] = { >> > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, >> > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, >> > + }, >> > + }, >> > +}; >> > + >> > +static int riscv_map_hw_event(u64 config) >> > +{ >> > + if (config >= riscv_pmu->max_events) >> > + return -EINVAL; >> > + >> > + return riscv_pmu->hw_events[config]; >> > +} >> > + >> > +int riscv_map_cache_decode(u64 config, unsigned int *type, >> > + unsigned int *op, unsigned int *result) >> > +{ >> > + return -ENOENT; >> > +} >> > + >> > +static int riscv_map_cache_event(u64 config) >> > +{ >> > + unsigned int type, op, result; >> > + int err = -ENOENT; >> > + int code; >> > + >> > + err = riscv_map_cache_decode(config, &type, &op, &result); >> > + if (!riscv_pmu->cache_events || err) >> > + return err; >> > + >> > + if (type >= PERF_COUNT_HW_CACHE_MAX || >> > + op >= PERF_COUNT_HW_CACHE_OP_MAX || >> > + result >= PERF_COUNT_HW_CACHE_RESULT_MAX) >> > + return -EINVAL; >> > + >> > + code = (*riscv_pmu->cache_events)[type][op][result]; >> > + if (code == RISCV_OP_UNSUPP) >> > + return -EINVAL; >> > + >> > + return code; >> > +} >> > + >> > +/* >> > + * Low-level functions: reading/writing counters >> > + */ >> > + >> > +static inline u64 read_counter(int idx) >> > +{ >> > + u64 val = 0; >> > + >> > + switch (idx) { >> > + case RISCV_PMU_CYCLE: >> > + val = csr_read(cycle); >> > + break; >> > + case RISCV_PMU_INSTRET: >> > + val = csr_read(instret); >> > + break; >> > + default: >> > + WARN_ON_ONCE(idx < 0 || idx > RISCV_MAX_COUNTERS); >> > + return -EINVAL; >> > + } >> > + >> > + return val; >> > +} >> > + >> > +static inline void write_counter(int idx, u64 value) >> > +{ >> > + /* currently not supported */ >> > +} >> > + >> > +/* >> > + * pmu->read: read and update the counter >> > + * >> > + * Other architectures' implementation often have a xxx_perf_event_update >> > + * routine, which can return counter values when called in the IRQ, but >> > + * return void when being called by the pmu->read method. >> > + */ >> > +static void riscv_pmu_read(struct perf_event *event) >> > +{ >> > + struct hw_perf_event *hwc = &event->hw; >> > + u64 prev_raw_count, new_raw_count; >> > + u64 oldval; >> > + int idx = hwc->idx; >> > + u64 delta; >> > + >> > + do { >> > + prev_raw_count = local64_read(&hwc->prev_count); >> > + new_raw_count = read_counter(idx); >> > + >> > + oldval = local64_cmpxchg(&hwc->prev_count, prev_raw_count, >> > + new_raw_count); >> > + } while (oldval != prev_raw_count); >> > + >> > + /* >> > + * delta is the value to update the counter we maintain in the >> > kernel. >> > + */ >> > + delta = (new_raw_count - prev_raw_count) & >> > + ((1ULL << riscv_pmu->counter_width) - 1); >> > + local64_add(delta, &event->count); >> > + /* >> > + * Something like local64_sub(delta, &hwc->period_left) here is >> > + * needed if there is an interrupt for perf. >> > + */ >> > +} >> > + >> > +/* >> > + * State transition functions: >> > + * >> > + * stop()/start() & add()/del() >> > + */ >> > + >> > +/* >> > + * pmu->stop: stop the counter >> > + */ >> > +static void riscv_pmu_stop(struct perf_event *event, int flags) >> > +{ >> > + struct hw_perf_event *hwc = &event->hw; >> > + >> > + WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); >> > + hwc->state |= PERF_HES_STOPPED; >> > + >> > + if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) >> > { >> > + riscv_pmu_read(event); >> > + hwc->state |= PERF_HES_UPTODATE; >> > + } >> > +} >> > + >> > +/* >> > + * pmu->start: start the event. >> > + */ >> > +static void riscv_pmu_start(struct perf_event *event, int flags) >> > +{ >> > + struct hw_perf_event *hwc = &event->hw; >> > + >> > + if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED))) >> > + return; >> > + >> > + if (flags & PERF_EF_RELOAD) { >> > + WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE)); >> > + >> > + /* >> > + * Set the counter to the period to the next interrupt >> > here, >> > + * if you have any. >> > + */ >> > + } >> > + >> > + hwc->state = 0; >> > + perf_event_update_userpage(event); >> > + >> > + /* >> > + * Since we cannot write to counters, this serves as an >> > initialization >> > + * to the delta-mechanism in pmu->read(); otherwise, the delta >> > would be >> > + * wrong when pmu->read is called for the first time. >> > + */ >> > + if (local64_read(&hwc->prev_count) == 0) >> > + local64_set(&hwc->prev_count, read_counter(hwc->idx)); >> > +} >> > + >> > +/* >> > + * pmu->add: add the event to PMU. >> > + */ >> > +static int riscv_pmu_add(struct perf_event *event, int flags) >> > +{ >> > + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); >> > + struct hw_perf_event *hwc = &event->hw; >> > + >> > + if (cpuc->n_events == riscv_pmu->num_counters) >> > + return -ENOSPC; >> > + >> > + /* >> > + * We don't have general conunters, so no binding-event-to-counter >> > + * process here. >> > + * >> > + * Indexing using hwc->config generally not works, since config may >> > + * contain extra information, but here the only info we have in >> > + * hwc->config is the event index. >> > + */ >> > + hwc->idx = hwc->config; >> > + cpuc->events[hwc->idx] = event; >> > + cpuc->n_events++; >> > + >> > + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED; >> > + >> > + if (flags & PERF_EF_START) >> > + riscv_pmu_start(event, PERF_EF_RELOAD); >> > + >> > + return 0; >> > +} >> > + >> > +/* >> > + * pmu->del: delete the event from PMU. >> > + */ >> > +static void riscv_pmu_del(struct perf_event *event, int flags) >> > +{ >> > + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); >> > + struct hw_perf_event *hwc = &event->hw; >> > + >> > + cpuc->events[hwc->idx] = NULL; >> > + cpuc->n_events--; >> > + riscv_pmu_stop(event, PERF_EF_UPDATE); >> > + perf_event_update_userpage(event); >> > +} >> > + >> > +/* >> > + * Interrupt >> > + */ >> > + >> > +static DEFINE_MUTEX(pmc_reserve_mutex); >> > +typedef void (*perf_irq_t)(void *riscv_perf_irq); >> > +perf_irq_t perf_irq; >> > + >> > +void riscv_pmu_handle_irq(void *riscv_perf_irq) >> > +{ >> > +} >> > + >> > +static perf_irq_t reserve_pmc_hardware(void) >> > +{ >> > + perf_irq_t old; >> > + >> > + mutex_lock(&pmc_reserve_mutex); >> > + old = perf_irq; >> > + perf_irq = &riscv_pmu_handle_irq; >> > + mutex_unlock(&pmc_reserve_mutex); >> > + >> > + return old; >> > +} >> > + >> > +void release_pmc_hardware(void) >> > +{ >> > + mutex_lock(&pmc_reserve_mutex); >> > + perf_irq = NULL; >> > + mutex_unlock(&pmc_reserve_mutex); >> > +} >> > + >> > +/* >> > + * Event Initialization >> > + */ >> > + >> > +static atomic_t riscv_active_events; >> > + >> > +static void riscv_event_destroy(struct perf_event *event) >> > +{ >> > + if (atomic_dec_return(&riscv_active_events) == 0) >> > + release_pmc_hardware(); >> > +} >> > + >> > +static int riscv_event_init(struct perf_event *event) >> > +{ >> > + struct perf_event_attr *attr = &event->attr; >> > + struct hw_perf_event *hwc = &event->hw; >> > + perf_irq_t old_irq_handler = NULL; >> > + int code; >> > + >> > + if (atomic_inc_return(&riscv_active_events) == 1) >> > + old_irq_handler = reserve_pmc_hardware(); >> > + >> > + if (old_irq_handler) { >> > + pr_warn("PMC hardware busy (reserved by oprofile)\n"); >> > + atomic_dec(&riscv_active_events); >> > + return -EBUSY; >> > + } >> > + >> > + switch (event->attr.type) { >> > + case PERF_TYPE_HARDWARE: >> > + code = riscv_pmu->map_hw_event(attr->config); >> > + break; >> > + case PERF_TYPE_HW_CACHE: >> > + code = riscv_pmu->map_cache_event(attr->config); >> > + break; >> > + case PERF_TYPE_RAW: >> > + return -EOPNOTSUPP; >> > + default: >> > + return -ENOENT; >> > + } >> > + >> > + event->destroy = riscv_event_destroy; >> > + if (code < 0) { >> > + event->destroy(event); >> > + return code; >> > + } >> > + >> > + /* >> > + * idx is set to -1 because the index of a general event should >> > not be >> > + * decided until binding to some counter in pmu->add(). >> > + * >> > + * But since we don't have such support, later in pmu->add(), we >> > just >> > + * use hwc->config as the index instead. >> > + */ >> > + hwc->config = code; >> > + hwc->idx = -1; >> > + >> > + return 0; >> > +} >> > + >> > +/* >> > + * Initialization >> > + */ >> > + >> > +static struct pmu min_pmu = { >> > + .name = "riscv-base", >> > + .event_init = riscv_event_init, >> > + .add = riscv_pmu_add, >> > + .del = riscv_pmu_del, >> > + .start = riscv_pmu_start, >> > + .stop = riscv_pmu_stop, >> > + .read = riscv_pmu_read, >> > +}; >> > + >> > +static const struct riscv_pmu riscv_base_pmu = { >> > + .pmu = &min_pmu, >> > + .max_events = ARRAY_SIZE(riscv_hw_event_map), >> > + .map_hw_event = riscv_map_hw_event, >> > + .hw_events = riscv_hw_event_map, >> > + .map_cache_event = riscv_map_cache_event, >> > + .cache_events = &riscv_cache_event_map, >> > + .counter_width = 63, >> > + .num_counters = RISCV_BASE_COUNTERS + 0, >> > +}; >> > + >> > +struct pmu * __weak __init riscv_init_platform_pmu(void) >> > +{ >> > + riscv_pmu = &riscv_base_pmu; >> > + return riscv_pmu->pmu; >> > +} >> > + >> > +int __init init_hw_perf_events(void) >> > +{ >> > + struct pmu *pmu = riscv_init_platform_pmu(); >> > + >> > + perf_irq = NULL; >> > + perf_pmu_register(pmu, "cpu", PERF_TYPE_RAW); >> > + return 0; >> > +} >> > +arch_initcall(init_hw_perf_events); >> > -- >> > 2.16.2 >> > >> > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html