Re: [PATCH bpf-next 6/6] selftest/bpf/benchs: Add benchmark for hashmap lookups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 23/01/31 02:50, Martin KaFai Lau wrote:
> On 1/31/23 3:05 AM, Anton Protopopov wrote:
> > On 23/01/30 04:22, Martin KaFai Lau wrote:
> > > On 1/27/23 10:14 AM, Anton Protopopov wrote:
> > > > +/* The number of slots to store times */
> > > > +#define NR_SLOTS 32
> > > > +
> > > > +/* Configured by userspace */
> > > > +u64 nr_entries;
> > > > +u64 nr_loops;
> > > > +u32 __attribute__((__aligned__(8))) key[256];
> > > > +
> > > > +/* Filled by us */
> > > > +u64 __attribute__((__aligned__(256))) percpu_times_index[NR_SLOTS]; > +u64 __attribute__((__aligned__(256))) percpu_times[256][NR_SLOTS];
> > > > +
> > > > +static inline void patch_key(u32 i)
> > > > +{
> > > > +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> > > > +	key[0] = i + 1;
> > > > +#else
> > > > +	key[0] = __builtin_bswap32(i + 1);
> > > > +#endif
> > > > +	/* the rest of key is random and is configured by userspace */
> > > > +}
> > > > +
> > > > +static int lookup_callback(__u32 index, u32 *unused)
> > > > +{
> > > > +	patch_key(index);
> > > > +	return bpf_map_lookup_elem(&hash_map_bench, key) ? 0 : 1;
> > > > +}
> > > > +
> > > > +static int loop_lookup_callback(__u32 index, u32 *unused)
> > > > +{
> > > > +	return bpf_loop(nr_entries, lookup_callback, NULL, 0) ? 0 : 1;
> > > > +}
> > > > +
> > > > +SEC("fentry/" SYS_PREFIX "sys_getpgid")
> > > > +int benchmark(void *ctx)
> > > > +{
> > > > +	u32 cpu = bpf_get_smp_processor_id();
> > > > +	u32 times_index;
> > > > +	u64 start_time;
> > > > +
> > > > +	times_index = percpu_times_index[cpu & 255] % NR_SLOTS;
> > > 
> > > percpu_times_index only has NR_SLOTS (32) elements?
> > 
> > Yes, the idea was the following. One measurement (bpf prog execution) takes
> > about 20-80 ms (depending on the key/map size). So in 2-3 seconds we can get
> > about NR_SLOTS elements. For me 32 looked like enough to get stats for this
> > benchmark. Do you think this is better to make the NR_SLOTS
> > bigger/configurable?
> 
> I thought percpu_times_index[] is the next slot to use for a particular cpu
> in percpu_times[256][NR_SLOTS]. 256 is the max number of cpu supported? It
> is doing "cpu" & 255 also. Should it be sized as percpu_times_index[256]
> instead then?
> 
> May be #define what 256 is here such that it can have a self describe name.

Oh, thanks! Of course, percpu_times_index is of wrong size,
I will fix this and use names instead of numbers.

> > 
> > > > +	start_time = bpf_ktime_get_ns();
> > > > +	bpf_loop(nr_loops, loop_lookup_callback, NULL, 0);
> > > > +	percpu_times[cpu & 255][times_index] = bpf_ktime_get_ns() - start_time;
> > > > +	percpu_times_index[cpu & 255] += 1;
> > > > +	return 0;
> > > > +}
> > > 
> 



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux