Matthew Wilcox maintains IDR so it's really up to him. I don't think adding a WARN_ON_ONCE() in idr_remove() for NULL returns is a bad idea but we could hide it behind a #if DEBUG_IDR or something and try run syzkaller on it first. regards, dan carpenter On Sat, Nov 16, 2024 at 06:45:37PM +0100, Alexandre Ferrieux wrote: > Hi, > > In the recent fix of u32's IDR leaks: > > 73af53d82076 net: sched: cls_u32: Fix u32's systematic failure to free IDR > entries for hnodes. > > ... one side remark is that the problem went unnoticed for 7 years due to the > NULL result from idr_remove() being ignored at this call site. > > Now, a cursory grep over the whole Linux tree shows 306 out of 386 call sites > (excluding those hidden in macros, if any) don't bother to extract the value > returned by idr_remove(). > > Indeed, a failed IDR removal is "mostly harmless" since IDs are not pointers so > the mismatch is detectable (and is detected, returning NULL). However, in racy > situations you may end up killing an innocent fresh entry, which may really > break things a bit later. And in all cases, a true bug is the root cause. > > So, unless we have reasons to think cls_u32 was the only place where two ID > encodings might lend themselves to confusion, I'm wondering if it wouldn't make > sense to chase the issue more systematically: > > - either with WARN_ON[_ONCE](idr_remove()==NULL) on each call site individually > (a year-long endeavor implying tens of maintainers) > > - or with WARN_ON[_ONCE] just before returning NULL within idr_remove() itself, > or even radix_tree_delete_item() (quicker but possibly disruptive) > > - a variant of the latter being to do it only for harsh bug-hunting builds (the > ones typically used by patrolling bots) > > Opinions ?