Re: [PATCH v4 bpf-next 0/8] bpf_prog_pack followup

Song Liu <song@xxxxxxxxxx> · Mon, 20 Jun 2022 19:51:24 -0700

On Mon, Jun 20, 2022 at 6:32 PM Aaron Lu <aaron.lu@xxxxxxxxx> wrote:
>
> On Mon, Jun 20, 2022 at 09:03:52AM -0700, Song Liu wrote:
> > Hi Aaron,
> >
> > On Mon, Jun 20, 2022 at 4:12 AM Aaron Lu <aaron.lu@xxxxxxxxx> wrote:
> > >
> > > Hi Song,
> > >
> > > On Fri, May 20, 2022 at 04:57:50PM -0700, Song Liu wrote:
> > >
> > > ... ...
> > >
> > > > The primary goal of bpf_prog_pack is to reduce iTLB miss rate and reduce
> > > > direct memory mapping fragmentation. This leads to non-trivial performance
> > > > improvements.
> > > >
> > > > For our web service production benchmark, bpf_prog_pack on 4kB pages
> > > > gives 0.5% to 0.7% more throughput than not using bpf_prog_pack.
> > > > bpf_prog_pack on 2MB pages 0.6% to 0.9% more throughput than not using
> > > > bpf_prog_pack. Note that 0.5% is a huge improvement for our fleet. I
> > > > believe this is also significant for other companies with many thousand
> > > > servers.
> > > >
> > >
> > > I'm evaluationg performance impact due to direct memory mapping
> > > fragmentation and seeing the above, I wonder: is the performance improve
> > > mostly due to prog pack and hugepage instead of less direct mapping
> > > fragmentation?
> > >
> > > I can understand that when progs are packed together, iTLB miss rate will
> > > be reduced and thus, performance can be improved. But I don't see
> > > immediately how direct mapping fragmentation can impact performance since
> > > the bpf code are running from the module alias addresses, not the direct
> > > mapping addresses IIUC?
> >
> > You are right that BPF code runs from module alias addresses. However, to
> > protect text from overwrites, we use set_memory_x() and set_memory_ro()
> > for the BPF code. These two functions will set permissions for all aliases
> > of the memory, including the direct map, and thus cause fragmentation of
> > the direct map. Does this make sense?
>
> Guess I didn't make it clear.
>
> I understand that set_memory_XXX() will cause direct mapping split and
> thus, fragmented. What is not clear to me is, how much impact does
> direct mapping fragmentation have on performance, in your case and in
> general?
>
> In your case, I guess the performance gain is due to code gets packed
> together and iTLB gets reduced. When code are a lot, packing them
> together as a hugepage is a further gain. In the meantime, direct
> mapping split (or not) seems to be a side effect of this packing, but it
> doesn't have a direct impact on performance.
>
> One thing I can imagine is, when an area of direct mapping gets splited
> due to permission reason, when that reason is gone(like module unload
> or bpf code unload), those areas will remain fragmented and that can
> cause later operations that touch these same areas using more dTLBs
> and that can be bad for performance, but it's hard to say how much
> impact this can cause though.

Yes, we have data showing the direct mapping remaining fragmented
can cause non-trivial performance degradation. For our web workload,
the difference is in the order of 1%.

Thanks,
Song