On Tue, Apr 19, 2022 at 12:20:39PM -0700, Linus Torvalds wrote: > On Tue, Apr 19, 2022 at 11:42 AM Mike Rapoport <rppt@xxxxxxxxxx> wrote: > > > > I'd say that bpf_prog_pack was a cure for symptoms and this project tries > > to address more general problem. > > But you are right, it'll take some time and won't land in 5.19. > > Just to update people: I've just applied Song's [1/4] patch, which > means that the whole current hugepage vmalloc thing is effectively > disabled (because nothing opts in). > > And I suspect that will be the status for 5.18, unless somebody comes > up with some very strong arguments for (re-)starting using huge pages. Here is the quote from Song's cover letter for bpf_prog_pack series: Most BPF programs are small, but they consume a page each. For systems with busy traffic and many BPF programs, this could also add significant pressure to instruction TLB. High iTLB pressure usually causes slow down for the whole system, which includes visible performance degradation for production workloads. The last sentence is the key. We've added this feature not because of bpf programs themselves. So calling this feature an optimization is not quite correct. The number of bpf programs on the production server doesn't matter. The programs come and go all the time. That is the key here. The 4k module_alloc() plus set_memory_ro/x done by the JIT break down huge pages and increase TLB pressure on the kernel code. That creates visible performance degradation for normal user space workloads that are not doing anything bpf related. mm folks can fill in the details here. My understanding it's something to do with identity mapping. So we're not trying to improve bpf performance. We're trying to make sure that bpf program load/unload doesn't affect the speed of the kernel. Generalizing bpf_prog_alloc to modules would be nice, but it's not clear what benefits such optimization might have. It's orthogonal here. So I argue that all 4 Song's fixes are necessary in 5.18. We need an additional zeroing patch too, of course, to make sure huge page doesn't have garbage at alloc time and it's cleaned after prog is unloaded. Regarding JIT spraying and other concerns. Short answer: nothing changed. JIT spraying was mitigated with start address randomization and invalid instruction padding. Both features are still present. Constant blinding is also fully functional. Any kind of generalization of bpf_prog_pack into general mm feature would be nice, but it cannot be done as opportunistic cache. We need a guarantee that bpf prog/unload won't recreate the issue with kernel performance degradation. I suspect we would need bpf_prog_pack in the current form for foreseeable future.