We observed that the lock being taken in this instance is for the LRU list of the map, which is taken before the bucket lock in htab_lock_bucket. Hence, htab_lock_bucket does not prevent this deadlock. Additionally, bpf prog recursion protection logic does not necessarily prevent bpf perf event programs, which can run in NMI, from executing in tandem with programs of other types that could be using the same map. We thought that returning early would be acceptable here since there are other circumstances in which htab_lru_map_update_elem can return an error. But as you say, the map behavior would become random with this patch. However, we are unsure how to fix this issue properly. It would be great to receive feedback on how we can fix it and we'll send a new patch with that in mind. On Wed, Aug 21, 2024 at 2:38 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Wed, Aug 21, 2024 at 2:30 PM Priya Bala Govindasamy <pgovind2@xxxxxxx> wrote: > > > > bpf_common_lru_pop_free uses raw_spin_lock_irqsave. This function is > > used by htab_lru_map_update_elem() which can be called from an > > NMI. A deadlock can happen if a bpf program holding the lock is > > interrupted by the same program in NMI. Use raw_spin_trylock_irqsave if > > in NMI. > > > > Fixes: 3a08c2fd7634 (bpf: LRU list) > > Signed-off-by: Priya Bala Govindasamy <pgovind2@xxxxxxx> > > Signed-off-by: Amery Hung <ameryhung@xxxxxxxxx> > > Nothing changed since last time exact same patch was posted, > so same nack as before. > pw-bot: cr