On Thu, Feb 6, 2025 at 1:59 AM Song Liu <song@xxxxxxxxxx> wrote: > > On Wed, Feb 5, 2025 at 6:43 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > On Tue, Feb 4, 2025 at 5:53 AM Song Liu <song@xxxxxxxxxx> wrote: > > > > > > On Mon, Feb 3, 2025 at 1:45 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > [...] > > > > > > > > If you’re managing a large fleet of servers, this issue is far from negligible. > > > > > > > > > > > > > > > Can you provide examples of companies that use atomic replacement at > > > > > > scale in their production environments? > > > > > > > > > > At least SUSE uses it as a solution for its customers. No many problems > > > > > have been reported since we started ~10 years ago. > > > > > > We (Meta) always use atomic replacement for our live patches. > > > > > > > > > > > Perhaps we’re running different workloads. > > > > Going back to the original purpose of livepatching: is it designed to address > > > > security vulnerabilities, or to deploy new features? > > > > If it’s the latter, then there’s definitely a lot of room for improvement. > > > > > > We only use KLP to fix bugs and security vulnerabilities. We do not use > > > live patches to deploy new features. > > > > +BPF > > > > Hello Song, > > > > Since bpf_fexit also uses trampolines, I was curious about what would > > happen if I attached do_exit() to fexit. Unfortunately, it triggers a > > bug in BPF as well. The BPF program is as follows: > > > > SEC("fexit/do_exit") > > int fexit_do_exit > > { > > return 0; > > } > > > > After the fexit program exits, the trampoline is still left over: > > > > $ bpftool link show <<<< nothing output > > $ grep "bpf_trampoline_[0-9]" /proc/kallsyms > > ffffffffc04cb000 t bpf_trampoline_6442526459 [bpf] > > I think we should first understand why the trampoline is not > freed. IIUC, the fexit works as follows, bpf_trampoline + __bpf_tramp_enter + percpu_ref_get(&tr->pcref); + call do_exit() + __bpf_tramp_exit + percpu_ref_put(&tr->pcref); Since do_exit() never returns, the refcnt of the trampoline image is never decremented, preventing it from being freed. > > > We could either add functions annotated as "__noreturn" to the deny > > list for fexit as follows, or we could explore a more generic > > solution, such as embedding the "noreturn" information into the BTF > > and extracting it when attaching fexit. > > I personally don't think this is really necessary. It is good to have. > But a reasonable user should not expect noreturn function to > generate fexit events. If we don't plan to fix it, we should clearly document it to guide users and advise them against using it. -- Regards Yafang