On Tue, 17 Nov 2020 20:54:24 -0800 Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > extern int > > @@ -310,7 +312,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) > > do { \ > > it_func = (it_func_ptr)->func; \ > > __data = (it_func_ptr)->data; \ > > - ((void(*)(void *, proto))(it_func))(__data, args); \ > > + /* \ > > + * Removed functions that couldn't be allocated \ > > + * are replaced with TRACEPOINT_STUB. \ > > + */ \ > > + if (likely(it_func != TRACEPOINT_STUB)) \ > > + ((void(*)(void *, proto))(it_func))(__data, args); \ > > I think you're overreacting to the problem. I will disagree with that. > Adding run-time check to extremely unlikely problem seems wasteful. Show me a real benchmark that you can notice a problem here. I bet that check is even within the noise of calling an indirect function. Especially on a machine with retpolines. > 99.9% of the time allocate_probes() will do kmalloc from slab of small > objects. > If that slab is out of memory it means it cannot allocate a single page. > In such case so many things will be failing to alloc that system > is unlikely operational. oom should have triggered long ago. > Imo Matt's approach to add __GFP_NOFAIL to allocate_probes() Looking at the GFP_NOFAIL comment: * %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller * cannot handle allocation failures. The allocation could block * indefinitely but will never return with failure. Testing for * failure is pointless. * New users should be evaluated carefully (and the flag should be * used only when there is no reasonable failure policy) but it is * definitely preferable to use the flag rather than opencode endless * loop around allocator. I realized I made a mistake in my patch for using it, as my patch is a failure policy. It looks like something we want to avoid in general. Thanks, I'll send a v3 that removes it. > when it's called from func_remove() is much better. > The error was reported by syzbot that was using > memory fault injections. ENOMEM in allocate_probes() was > never seen in real life and highly unlikely will ever be seen. And the biggest thing you are missing here, is that if you are running on a machine that has static calls, this code is never hit unless you have more than one callback on a single tracepoint. That's because when there's only one callback, it gets called directly, and this loop is not involved. -- Steve