Hi, On 8/23/2022 8:56 AM, Hao Luo wrote: > On Mon, Aug 22, 2022 at 11:01 AM Hao Luo <haoluo@xxxxxxxxxx> wrote: >> On Mon, Aug 22, 2022 at 5:08 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >>> Hi, >>> >>> On 8/22/2022 11:21 AM, Hao Luo wrote: >>>> Hi, Hou Tao >>>> >>>> On Sun, Aug 21, 2022 at 6:28 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >>>>> Hi, >>>>> >>>>> On 8/22/2022 12:42 AM, Hao Luo wrote: >>>>>> Hi Hou Tao, >>>>>> >>>>>> On Sat, Aug 20, 2022 at 8:14 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >>>>>>> From: Hou Tao <houtao1@xxxxxxxxxx> >>>>>>> >>>>>>> Per-cpu htab->map_locked is used to prohibit the concurrent accesses >>>>>>> from both NMI and non-NMI contexts. But since 74d862b682f51 ("sched: >>>>>>> Make migrate_disable/enable() independent of RT"), migrations_disable() >>>>>>> is also preemptible under !PREEMPT_RT case, so now map_locked also >>>>>>> disallows concurrent updates from normal contexts (e.g. userspace >>>>>>> processes) unexpectedly as shown below: >>>>>>> >>>>>>> process A process B >>>>>>> SNIP >>>>>>> First please enable CONFIG_PREEMPT for the running kernel and then run the >>>>>>> following test program as shown below. >>>>>>> >> Ah, fully preemptive kernel. It's worth mentioning that in the commit >> message. Then it seems promoting migrate_disable to preempt_disable >> may be the best way to solve the problem you described. Sorry, i missed that. Will add in v2. >> >>> # sudo taskset -c 2 ./update.bin >>> thread nr 2 >>> wait for error >>> update error -16 >>> all threads exit >>> >>> If there is no "update error -16", you can try to create more map update >>> threads. For example running 16 update threads: >>> >>> # sudo taskset -c 2 ./update.bin 16 >>> thread nr 16 >>> wait for error >>> update error -16 >>> update error -16 >>> update error -16 >>> update error -16 >>> update error -16 >>> update error -16 >>> update error -16 >>> update error -16 >>> all threads exit SNIP > Tao, thanks very much for the test. I played it a bit and I can > confirm that map_update failures are seen under CONFIG_PREEMPT. The > failures are not present under CONFIG_PREEMPT_NONE or > CONFIG_PREEMPT_VOLUNTARY. I experimented with a few alternatives I was > thinking of and they didn't work. It looks like Hou Tao's idea, > promoting migrate_disable to preempt_disable, is probably the best we > can do for the non-RT case. So > > Reviewed-by: Hao Luo <haoluo@xxxxxxxxxx> > > But, I am not sure if we want to get rid of preemption-caused batch > map updates on preemptive kernels, or if there are better solutions to > address [0]. Thanks for your review. Under preemptive kernel, if the preemption is disabled during batch map lookup or update, htab_lock_bucket() from userspace will not fail, so I think it is OK for now. > > Tao, could you mention CONFIG_PREEMPT in your commit message? Thanks. Will do. >>>> Here is my theory, but please correct me if I'm wrong, I haven't >>>> tested yet. In non-RT, I doubt preemptions are likely to happen after >>>> migrate_disable. That is because very soon after migrate_disable, we >>>> enter the critical section of b->raw_lock with irq disabled. In RT, >>>> preemptions can happen on acquiring b->lock, that is certainly >>>> possible, but this is the !use_raw_lock path, which isn't side-stepped >>>> by this patch. >>>> >>>>>>> kernel/bpf/hashtab.c | 23 ++++++++++++++++++----- >>>>>>> 1 file changed, 18 insertions(+), 5 deletions(-) >>>>>>> >>>>>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >>>>>>> index 6c530a5e560a..ad09da139589 100644 >>>>>>> --- a/kernel/bpf/hashtab.c >>>>>>> +++ b/kernel/bpf/hashtab.c >>>>>>> @@ -162,17 +162,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab, >>>>>>> unsigned long *pflags) >>>>>>> { >>>>>>> unsigned long flags; >>>>>>> + bool use_raw_lock; >>>>>>> >>>>>>> hash = hash & HASHTAB_MAP_LOCK_MASK; >>>>>>> >>>>>>> - migrate_disable(); >>>>>>> + use_raw_lock = htab_use_raw_lock(htab); >>>>>>> + if (use_raw_lock) >>>>>>> + preempt_disable(); >>>>>>> + else >>>>>>> + migrate_disable(); >>>>>>> if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { >>>>>>> __this_cpu_dec(*(htab->map_locked[hash])); >>>>>>> - migrate_enable(); >>>>>>> + if (use_raw_lock) >>>>>>> + preempt_enable(); >>>>>>> + else >>>>>>> + migrate_enable(); >>>>>>> return -EBUSY; >>>>>>> } >>>>>>> >>>>>>> - if (htab_use_raw_lock(htab)) >>>>>>> + if (use_raw_lock) >>>>>>> raw_spin_lock_irqsave(&b->raw_lock, flags); >>>>>>> else >>>>>>> spin_lock_irqsave(&b->lock, flags); >>>>>>> @@ -185,13 +193,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab, >>>>>>> struct bucket *b, u32 hash, >>>>>>> unsigned long flags) >>>>>>> { >>>>>>> + bool use_raw_lock = htab_use_raw_lock(htab); >>>>>>> + >>>>>>> hash = hash & HASHTAB_MAP_LOCK_MASK; >>>>>>> - if (htab_use_raw_lock(htab)) >>>>>>> + if (use_raw_lock) >>>>>>> raw_spin_unlock_irqrestore(&b->raw_lock, flags); >>>>>>> else >>>>>>> spin_unlock_irqrestore(&b->lock, flags); >>>>>>> __this_cpu_dec(*(htab->map_locked[hash])); >>>>>>> - migrate_enable(); >>>>>>> + if (use_raw_lock) >>>>>>> + preempt_enable(); >>>>>>> + else >>>>>>> + migrate_enable(); >>>>>>> } >>>>>>> >>>>>>> static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node); >>>>>>> -- >>>>>>> 2.29.2 >>>>>>> >>>>>> . >>>> .