On Mon, Jun 20, 2022 at 07:51:24PM -0700, Song Liu wrote: > On Mon, Jun 20, 2022 at 6:32 PM Aaron Lu <aaron.lu@xxxxxxxxx> wrote: > > > > On Mon, Jun 20, 2022 at 09:03:52AM -0700, Song Liu wrote: > > > Hi Aaron, > > > > > > On Mon, Jun 20, 2022 at 4:12 AM Aaron Lu <aaron.lu@xxxxxxxxx> wrote: > > > > > > > > Hi Song, > > > > > > > > On Fri, May 20, 2022 at 04:57:50PM -0700, Song Liu wrote: > > > > > > > > ... ... > > > > > > > > > The primary goal of bpf_prog_pack is to reduce iTLB miss rate and reduce > > > > > direct memory mapping fragmentation. This leads to non-trivial performance > > > > > improvements. > > > > > > > > > > For our web service production benchmark, bpf_prog_pack on 4kB pages > > > > > gives 0.5% to 0.7% more throughput than not using bpf_prog_pack. > > > > > bpf_prog_pack on 2MB pages 0.6% to 0.9% more throughput than not using > > > > > bpf_prog_pack. Note that 0.5% is a huge improvement for our fleet. I > > > > > believe this is also significant for other companies with many thousand > > > > > servers. > > > > > > > > > > > > > I'm evaluationg performance impact due to direct memory mapping > > > > fragmentation and seeing the above, I wonder: is the performance improve > > > > mostly due to prog pack and hugepage instead of less direct mapping > > > > fragmentation? > > > > > > > > I can understand that when progs are packed together, iTLB miss rate will > > > > be reduced and thus, performance can be improved. But I don't see > > > > immediately how direct mapping fragmentation can impact performance since > > > > the bpf code are running from the module alias addresses, not the direct > > > > mapping addresses IIUC? > > > > > > You are right that BPF code runs from module alias addresses. However, to > > > protect text from overwrites, we use set_memory_x() and set_memory_ro() > > > for the BPF code. These two functions will set permissions for all aliases > > > of the memory, including the direct map, and thus cause fragmentation of > > > the direct map. Does this make sense? > > > > Guess I didn't make it clear. > > > > I understand that set_memory_XXX() will cause direct mapping split and > > thus, fragmented. What is not clear to me is, how much impact does > > direct mapping fragmentation have on performance, in your case and in > > general? > > > > In your case, I guess the performance gain is due to code gets packed > > together and iTLB gets reduced. When code are a lot, packing them > > together as a hugepage is a further gain. In the meantime, direct > > mapping split (or not) seems to be a side effect of this packing, but it > > doesn't have a direct impact on performance. > > > > One thing I can imagine is, when an area of direct mapping gets splited > > due to permission reason, when that reason is gone(like module unload > > or bpf code unload), those areas will remain fragmented and that can > > cause later operations that touch these same areas using more dTLBs > > and that can be bad for performance, but it's hard to say how much > > impact this can cause though. > > Yes, we have data showing the direct mapping remaining fragmented > can cause non-trivial performance degradation. For our web workload, > the difference is in the order of 1%. Many thanks for the info, really appreciate it. Regards, Aaron