Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: > On Thu, 2015-05-28 at 22:51 +0200, Florian Westphal wrote: > > We store rule blob per (possible) cpu. Unfortunately this means we can > > waste lot of memory on big smp machines. ipt_entry structure ('rule head') > > is 112 byte, so e.g. with maxcpu=64 one single rule eats close to 8k RAM. > > > > Since previous patch moved counters to separate percpu blob, it appears > > there is nothing left in the rule blob that must be percpu. > > > > Thus only duplicate the rule blob for each NUMA node. > > > > On my test system (144 possible cpus, one numa node, 400k dummy rules) this > > change saves close to 9 Gigabyte of RAM. > > > > Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx> > > Acked-by: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> > > Signed-off-by: Florian Westphal <fw@xxxxxxxxx> > > --- > > Really if the program is now readonly, I would keep a single copy in > memory. Some matches (limit for instance) store kernel data ptr in their matchinfo data (from checkentry hook, not per packet match function), so its not 100% readonly. > Are we copying kernel text to each NUMA node ? ;) Beats me. I was under impression that cpu accessing memory on other node takes access penalty, thats why I changed it to per node allocation. Is it insignificant in practice? If so, I can respin it w/o the numa duplication; we can still add it back later if needed. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html