On Thu, 16 May 2024 at 16:05, Barret Rhoden <brho@xxxxxxxxxx> wrote: > > On 5/14/24 08:40, Siddharth Chintamaneni wrote: > [...] > > +static inline int map_lock_inc(struct bpf_queue_stack *qs) > > +{ > > + unsigned long flags; > > + > > + preempt_disable(); > > + local_irq_save(flags); > > + if (unlikely(__this_cpu_inc_return(*(qs->map_locked)) != 1)) { > > + __this_cpu_dec(*(qs->map_locked)); > > + local_irq_restore(flags); > > + preempt_enable(); > > + return -EBUSY; > > + } > > + > > + local_irq_restore(flags); > > + preempt_enable(); > > it looks like you're taking the approach from kernel/bpf/hashtab.c to > use a per-cpu lock before grabbing the real lock. but in the success > case here (where you incremented the percpu counter), you're enabling > irqs and preemption. > > what happens if you get preempted right after this? you've left the > per-cpu bit set, but then you run on another cpu. Great catch, that's a bug. It's not a problem when BPF programs call this, as migration is disabled for them (but it's questionable whether we should keep preemption enabled between map_inc/dec increasing the chances of conflicts on the same CPU), but it's certainly a problem from the syscall path. > > possible alternative: instead of splitting the overall lock into "grab > percpu lock, then grab real lock", have a single function for both, > similar to htab_lock_bucket(). and keep irqs and preemption off from > the moment you start attempting the overall lock until you completely > unlock. +1. > > barret >