Re: Potential deadlock in bpf_lpm_trie

Amery Hung <ameryhung@xxxxxxxxx> · Wed, 29 May 2024 15:02:56 -0700

On Wed, May 29, 2024 at 2:50 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Wed, May 29, 2024 at 2:46 PM Amery Hung <ameryhung@xxxxxxxxx> wrote:
> >
> > On Wed, May 29, 2024 at 2:20 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Wed, May 29, 2024 at 8:53 AM Amery Hung <ameryhung@xxxxxxxxx> wrote:
> > > >
> > > > Hello,
> > > >
> > > > We are developing a tool to perform static analysis on the bpf
> > > > subsystem to detect locking violations. Our tool reported the
> > > > spin_lock_irqsave() in trie_delete_elem() and trie_update_elem() that
> > > > could be called from an NMI. If a bpf program holding the lock is
> > > > interrupted by the same program in NMI, a deadlock can happen. The
> > > > report was generated for kernel version 6.6-rc4, however, we believe
> > > > this should still exist in the latest kernel.
> > >
> > > Fix it similar to
> > > https://lore.kernel.org/all/20230911132815.717240-1-toke@xxxxxxxxxx/
> > > ?
> >
> > I applied the similar fixing approach to trie->lock, and then I found
> > the two other locks mentioned earlier. My feeling is that there might
> > not be a use case to justify doing trylocks in memcg and rcu. If you
> > think the approach below is okay. I can send a fixing patch.
> >
> > trie_update_elem() {
> > +        if (in_nmi())
> > +                return -EBUSY;
> > }
>
> That's too crude. Trylock is less surprising.
> re: other locks (memcg and rcu).
> I think it's time to switch the whole lpm map to bpf_mem_alloc like we
> did for the hash table.

Thanks for the pointer. That makes sense. I will test and send a fix.