On 2024-08-13 16:37:19 [+0200], Florian Westphal wrote: > Hi Sebastian! Hi Florian, > > While trying to trigger the writer side, I managed only to trigger a > > single reader and only while using iptables-legacy/ arptables-legacy > > commands. The nft did not trigger it. So it is legacy code only. > > Yes, this is legacy only. perfect. > > Would it work to convert the counters to u64_stats_sync? On 32bit > > there would be a seqcount_t with preemption disabling during the > > update which means the xt_write_recseq_begin()/ xt_write_recseq_end() > > has to be limited the counter update only. On 64bit architectures there > > would be just the update. This means that number of packets and bytes > > might be "off" (the one got updated, the other not "yet") but I don't > > think that this is a problem here. > > Unfortunately its not only about counters; local_bh_disable() is also > used to prevent messing up the chain jump stack. Okay. But I could get rid of the counters/ seqcount and worry about the other things later on? > For local hooks, this is called from process context, so in order > to avoid timers kicking in and then re-using the jumpstack, this > local_bh_disable avoids that. > > The chain stack is percpu in -legacy, and on-stack in nf_tables. > > Then, there is also recursion via xt_TEE.c, hence this strange > if (static_key_false(&xt_tee_enabled)) > > in ipt_do_table() (We'll switch to a shadow-stack for that case). Yeah. That is another per-CPU thingy. So my plan was to replace the seqcount/ counters with u64_stats_sync, make this per-CPU nf_skb_duplicated per-TASK and then come back and check what local_bh_disable() is actually protecting other than skb alloc/ free during callback invocation (as in TEE). So jumpstack. This is exclusively used by ipt_do_table(). Not sure how a timer comes here but I goes any softirq (as in NAPI) would do the job without actually disabling BH. Can this be easily transformed to the on-stack thingy that nft is using or is it completely different? If the latter I would just a add a lock. local_lock_nested_bh() would be the easiest to not upset anyone but this is using hand crafted per-CPU memory instead of alloc_percpu(). Can stacksize get extremely huge? In case I made a wrong turn somewhere, do you have better suggestions? Sebastian