On Fri, Jun 25, 2021 at 12:46:03PM -0700, Yonghong Song wrote: > > > On 6/23/21 7:25 PM, Alexei Starovoitov wrote: > > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > > > Restrict bpf timers to array, hash (both preallocated and kmalloced), and > > lru map types. The per-cpu maps with timers don't make sense, since 'struct > > bpf_timer' is a part of map value. bpf timers in per-cpu maps would mean that > > the number of timers depends on number of possible cpus and timers would not be > > accessible from all cpus. lpm map support can be added in the future. > > The timers in inner maps are supported. > > > > The bpf_map_update/delete_elem() helpers and sys_bpf commands cancel and free > > bpf_timer in a given map element. > > > > Similar to 'struct bpf_spin_lock' BTF is required and it is used to validate > > that map element indeed contains 'struct bpf_timer'. > > > > Make check_and_init_map_value() init both bpf_spin_lock and bpf_timer when > > map element data is reused in preallocated htab and lru maps. > > > > Teach copy_map_value() to support both bpf_spin_lock and bpf_timer in a single > > map element. There could be one of each, but not more than one. Due to 'one > > bpf_timer in one element' restriction do not support timers in global data, > > since global data is a map of single element, but from bpf program side it's > > seen as many global variables and restriction of single global timer would be > > odd. The sys_bpf map_freeze and sys_mmap syscalls are not allowed on maps with > > timers, since user space could have corrupted mmap element and crashed the > > kernel. The maps with timers cannot be readonly. Due to these restrictions > > search for bpf_timer in datasec BTF in case it was placed in the global data to > > report clear error. > > > > The previous patch allowed 'struct bpf_timer' as a first field in a map > > element only. Relax this restriction. > > > > Refactor lru map to s/bpf_lru_push_free/htab_lru_push_free/ to cancel and free > > the timer when lru map deletes an element as a part of it eviction algorithm. > > > > Make sure that bpf program cannot access 'struct bpf_timer' via direct load/store. > > The timer operation are done through helpers only. > > This is similar to 'struct bpf_spin_lock'. > > > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > > I didn't find major issues. Only one minor comment below. I do a race > during map_update where updated map elements will have timer removed > but at the same time the timer might still be used after update. But > irq spinlock should handle this properly. Right. It's safe from kernel pov. But non-atomic from bpf prog pov. If one prog doing map_update_elem() it replaced the other fields except timer, but the other fields are also not atomic. So not a new concern for bpf authors. > Acked-by: Yonghong Song <yhs@xxxxxx> > > > --- > > include/linux/bpf.h | 44 ++++++++++++----- > > include/linux/btf.h | 1 + > > kernel/bpf/arraymap.c | 22 +++++++++ > > kernel/bpf/btf.c | 77 ++++++++++++++++++++++++------ > > kernel/bpf/hashtab.c | 96 +++++++++++++++++++++++++++++++++----- > > kernel/bpf/local_storage.c | 4 +- > > kernel/bpf/map_in_map.c | 1 + > > kernel/bpf/syscall.c | 21 +++++++-- > > kernel/bpf/verifier.c | 30 ++++++++++-- > > 9 files changed, 251 insertions(+), 45 deletions(-) > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index 72da9d4d070c..90e6c6d1404c 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -198,24 +198,46 @@ static inline bool map_value_has_spin_lock(const struct bpf_map *map) > > return map->spin_lock_off >= 0; > > } > > -static inline void check_and_init_map_lock(struct bpf_map *map, void *dst) > > +static inline bool map_value_has_timer(const struct bpf_map *map) > > { > > - if (likely(!map_value_has_spin_lock(map))) > > - return; > > - *(struct bpf_spin_lock *)(dst + map->spin_lock_off) = > > - (struct bpf_spin_lock){}; > > + return map->timer_off >= 0; > > } > > -/* copy everything but bpf_spin_lock */ > > +static inline void check_and_init_map_value(struct bpf_map *map, void *dst) > > +{ > > + if (unlikely(map_value_has_spin_lock(map))) > > + *(struct bpf_spin_lock *)(dst + map->spin_lock_off) = > > + (struct bpf_spin_lock){}; > > + if (unlikely(map_value_has_timer(map))) > > + *(struct bpf_timer *)(dst + map->timer_off) = > > + (struct bpf_timer){}; > > +} > > + > [...] > > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > > index 6f6681b07364..e85a5839ffe8 100644 > > --- a/kernel/bpf/hashtab.c > > +++ b/kernel/bpf/hashtab.c > > @@ -228,6 +228,28 @@ static struct htab_elem *get_htab_elem(struct bpf_htab *htab, int i) > > return (struct htab_elem *) (htab->elems + i * (u64)htab->elem_size); > > } > > +static void htab_free_prealloced_timers(struct bpf_htab *htab) > > +{ > > + u32 num_entries = htab->map.max_entries; > > + int i; > > + > > + if (likely(!map_value_has_timer(&htab->map))) > > + return; > > + if (!htab_is_percpu(htab) && !htab_is_lru(htab)) > > Is !htab_is_percpu(htab) needed? I think we already checked > if map_value has timer it can only be hash/lruhash/array? Technically not, but it's good to keep this part the same as in prealloc_init(). I'll refactor these two checks into a small helper function and will use in both places.