On 2024/9/28 1:12, Namhyung Kim wrote: > On Fri, Sep 27, 2024 at 10:35:54AM +0800, Tengda Wu wrote: >> >> >> On 2024/9/26 12:16, Namhyung Kim wrote: >>> On Wed, Sep 25, 2024 at 01:55:22PM +0000, Tengda Wu wrote: >>>> bperf restricts the size of perf_attr_map's entries to 16, which >>>> cannot hold all events in many scenarios. A typical example is >>>> when the user specifies `-a -ddd` ([0]). And in other cases such as >>>> top-down analysis, which often requires a set of more than 16 PMUs >>>> to be collected simultaneously. >>>> >>>> Fix this by increase perf_attr_map entries to 100, and an event >>>> number check has been introduced when bperf__load() to ensure that >>>> users receive a more friendly prompt when the event limit is reached. >>>> >>>> [0] https://lore.kernel.org/all/20230104064402.1551516-3-namhyung@xxxxxxxxxx/ >>> >>> Apparently this patch was never applied. I don't know how much you need >>> but having too many events at the same time won't be very useful because >>> multiplexing could reduce the accuracy. >>> >> >> Could you please explain why patch [0] was not merged at that time? I couldn't >> find this information from the previous emails. > > I guess it's just fell through the crack. :) Hope it won't happen again. 😆 > >> >> In my scenario, we collect more than 40+ events to support necessary metric >> calculations, which multiplexing is inevitable. Although multiplexing may >> reduce accuracy, for the purpose of supporting metric calculations, these >> accuracy losses can be acceptable. Perf also has the same issue with multiplexing. >> Removing the event limit for bperf can provide users with additional options. >> >> In addition to accuracy, we also care about overhead. I compared the overhead >> of bperf and perf by testing ./lat_ctx in lmbench [1], and found that the >> overhead of bperf stat is about 4% less than perf. This is why we choose to >> use bperf in some extreme scenarios. > > Ok, thanks for explanation. I think it's ok to increase the limit. > > Thanks, > Namhyung > >> >> [1] https://github.com/intel/lmbench >> >> Thanks, >> Tengda >> >>> >>>> >>>> Fixes: 7fac83aaf2ee ("perf stat: Introduce 'bperf' to share hardware PMCs with BPF") >>>> Signed-off-by: Tengda Wu <wutengda@xxxxxxxxxxxxxxx> >>>> --- >>>> tools/perf/util/bpf_counter.c | 8 +++++++- >>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c >>>> index 7a8af60e0f51..3346129c20cf 100644 >>>> --- a/tools/perf/util/bpf_counter.c >>>> +++ b/tools/perf/util/bpf_counter.c >>>> @@ -28,7 +28,7 @@ >>>> #include "bpf_skel/bperf_leader.skel.h" >>>> #include "bpf_skel/bperf_follower.skel.h" >>>> >>>> -#define ATTR_MAP_SIZE 16 >>>> +#define ATTR_MAP_SIZE 100 >>>> >>>> static inline void *u64_to_ptr(__u64 ptr) >>>> { >>>> @@ -451,6 +451,12 @@ static int bperf__load(struct evsel *evsel, struct target *target) >>>> enum bperf_filter_type filter_type; >>>> __u32 filter_entry_cnt, i; >>>> >>>> + if (evsel->evlist->core.nr_entries > ATTR_MAP_SIZE) { >>>> + pr_err("Too many events, please limit to %d or less\n", >>>> + ATTR_MAP_SIZE); >>>> + return -1; >>>> + } >>>> + >>>> if (bperf_check_target(evsel, target, &filter_type, &filter_entry_cnt)) >>>> return -1; >>>> >>>> -- >>>> 2.34.1 >>>> >>