Hello, Suren. On Wed, May 03, 2023 at 10:42:11AM -0700, Suren Baghdasaryan wrote: > > * The framework doesn't really have any runtime overhead, so we can have it > > deployed in the entire fleet and debug wherever problem is. > > Do you mean it has no runtime overhead when disabled? Yes, that's what I meant. > If so, do you know what's the overhead when enabled? I want to > understand if that's truly a viable solution to track all allocations > (including slab) all the time. (cc'ing Alexei and Andrii who know a lot better than me) I don't have enough concrete benchmark data on the hand to answer definitively but hopefully what my general impresison would help. We attach BPF programs to both per-packet and per-IO paths. They obviously aren't free but their overhead isn't signficantly higher than building in the same thing in C code. Once loaded, BPF progs are jit compiled into native code. The generated code will be a bit worse than regularly compiled C code but those are really micro differences. There's some bridging code to jump into BPF but again negligible / acceptable even in the hottest paths. In terms of execution overhead, I don't think there is a signficant disadvantage to doing these things in BPF. Bigger differences would likely be in tracking data structures and locking around them. One can definitely better integrate tracking into alloc / free paths piggybacking on existing locking and whatnot. That said, BPF hashtable is pretty fast and BPF is constantly improving in terms of data structure support. It really depends on the workload and how much overhead one considers acceptable and I'm sure persistent global tracking can be done more efficiently with built-in C code. That said, done right, the overhead difference most likely isn't gonna be orders of magnitude but more like in the realm of tens of percents, if that. So, it doesn't nullify the benefits a dedicated mechansim can bring but does change the conversation quite a bit. Is the extra code justifiable given that most of what it enables is already possible using a more generic mechanism, albeit at a bit higher cost? That may well be the case but it does raise the bar. Thanks. -- tejun