On Mon, Nov 4, 2024 at 3:55 AM Xu Kuohai <xukuohai@xxxxxxxxxxxxxxx> wrote: > > >> *(unsigned long *)(udata + moff) = prog->aux->id; > >> + > >> + /* init ksym for this trampoline */ > >> + bpf_struct_ops_ksym_init(prog, image + trampoline_start, > >> + image_off - trampoline_start, > >> + ksym++); > > > > Thanks for the patch. > > I think it's overkill to add ksym for each callsite within a single > > trampoline. > > 1. The prog name will be next in the stack. No need to duplicate it. > > 2. ksym-ing callsites this way is quite unusual. > > 3. consider irq on other insns within a trampline. > > The unwinder won't find anything in such a case. > > > > So I suggest to add only one ksym that covers the whole trampoline. > > The name could be "bpf_trampoline_structopsname" > > that is probably st_ops_desc->type. > > > > IIUC, the "whole trampoline" for a struct_ops is actually the page > array st_map->image_pages[MAX_TRAMP_IMAGE_PAGES], where each page is > allocated by arch_alloc_bpf_trampoline(PAGE_SIZE). > > Since the virtual addresses of these pages are *NOT* guaranteed to > be contiguous, I dont think we can create a single ksym for them. > > And if we add a ksym for each individual page, it seems we will end > up with an odd name for each ksym. I see. Good point. Ok. Let's add ksym for each callback. > Given that each page consists of one or more bpf trampolines, which > are not different from bpf trampolines for other prog types, such as > bpf trampolines for fentry, and since each bpf trampoline for other > prog types already has a ksym, I think it is not unusual to add ksym > for each single bpf trampoline in the page. > > And, there are no instructions between adjacent bpf trampolines within > a page, nothing between two trampolines can be interrupted. > > For the name, bpf_trampoline_<struct_ops_name>_<member_name>, like > bpf_trampoline_tcp_congestion_ops_pkts_acked, seems appropriate. Agree. This naming convention makes sense. I'd only shorten the prefix to 'bpf_tramp_' or even 'bpf__' (with double underscore). It's kinda obvious that it's a trampoline and it's an implementation detail that doesn't need to be present in the name. > > >> } > >> > >> if (st_ops->validate) { > >> @@ -790,6 +829,8 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, > >> unlock: > >> kfree(tlinks); > >> mutex_unlock(&st_map->lock); > >> + if (!err) > >> + bpf_struct_ops_map_ksyms_add(st_map); > >> return err; > >> } > >> > >> @@ -883,6 +924,10 @@ static void bpf_struct_ops_map_free(struct bpf_map *map) > >> */ > >> synchronize_rcu_mult(call_rcu, call_rcu_tasks); > >> > >> + /* no trampoline in the map is running anymore, delete symbols */ > >> + bpf_struct_ops_map_ksyms_del(st_map); > >> + synchronize_rcu(); > >> + > > > > This is substantial overhead and why ? > > synchronize_rcu_mult() is right above. > > > > I think we should ensure no trampoline is running or could run before > its ksym is deleted from the symbol table. If this order is not ensured, > a trampoline can be interrupted by a perf irq after its symbol is deleted, > resulting a broken stacktrace since the trampoline symbol cound not be > found by the perf irq handler. > > This patch deletes ksyms after synchronize_rcu_mult() to ensure this order. But the overhead is prohibitive. We had broken stacks with st_ops for long time, so it may still hit 0.001% where st_ops are being switched as the comment in bpf_struct_ops_map_free() explains. As a separate clean up I would switch the freeing to call_rcu_tasks. Synchronous waiting is expensive. Martin, any suggestions?