Re: [PATCH bpf v3 4/6] bpf: Optimize the free of inner map

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alexei,

On 11/27/2023 9:49 AM, Alexei Starovoitov wrote:
> On Fri, Nov 24, 2023 at 3:29 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>> From: Hou Tao <houtao1@xxxxxxxxxx>
>>
>> When removing the inner map from the outer map, the inner map will be
>> freed after one RCU grace period and one RCU tasks trace grace
>> period, so it is certain that the bpf program, which may access the
>> inner map, has exited before the inner map is freed.
>>
>> However there is unnecessary to wait for any RCU grace period, one RCU
>> grace period or one RCU tasks trace grace period if the outer map is
>> only accessed by userspace, sleepable program or non-sleepable program.
>> So recording the sleepable attributes of the owned bpf programs when
>> adding the outer map into env->used_maps, copying the recorded
>> attributes to inner map atomically when removing inner map from the
>> outer map and using the recorded attributes in the inner map to decide
>> which, and how many, RCU grace periods are needed when freeing the
>> inner map.
>>
>> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
>>
>> +       if (deferred) {
>> +               /* used_in_rcu_gp may be updated concurrently by new bpf
>> +                * program, so add smp_mb() to guarantee the order between
>> +                * used_in_rcu_gp and lookup/deletion operation of inner map.
>> +                * If a new bpf program finds the inner map before it is
>> +                * removed from outer map, reading used_in_rcu_gp below will
>> +                * return the newly-set bit set by the new bpf program.
>> +                */
>> +               smp_mb();
>> +               atomic_or(atomic_read(&map->used_in_rcu_gp), &inner_map->free_by_rcu_gp);
> You resent the patches before I had time to reply to the previous thread...

I didn't receive the reply from last Thursday,  so I though there is no
new suggestions from you and sent out v3 which incorporate suggestions
from Martin. I will ping next time before sending out new version if
there is still pending discussion about the posted patch-set.
>
>> I think the main reason is that there is four possible case for the free
>> of inner map:
>> (1) neither call synchronize_rcu() nor synchronize_rcu_tasks_trace()
>> It is true when the outer map is only being accessed in user space.
>> (2) only call synchronize_rcu()
>> the outer map is only being accessed by non-sleepable bpf program
>> (3) only call synchronize_rcu_tasks_trace
>> the outer map is only being accessed by sleepable bpf program
>> (4) call both synchronize_rcu() and synchronize_rcu_tasks_trace()
>>
>> Only using sleepable_refcnt can not express 4 possible cases and we also
>> need to atomically copy the states from outer map to inner map, because
>> one inner map may be used concurrently by multiple outer map, so atomic
>> or mask are chosen.
> We don't care about optimizing 1, since it's rare case.
> We also don't care about optimizing 3, since sync_rcu time is negligible
> when we need to wait for sync_rcu_tasks_trace and also
> because rcu_trace_implies_rcu_gp() for foreseeable future.

I see.
>
>> need to atomically
> we do NOT have such need.

Here the atomicity means that multiple updates to
inner_map->free_in_rcu_gp should not overwrite each other and it has
nothing to do with the memory barrier. And using spin_lock will serve
the same purpose.
> There is zero need to do this atomic games and barries "just want to
> be cautious". The code should not have any code at all "to be
> cautious".
> Every barrier has to be a real reason behind it.
> Please remove them.

Will remove the memory barrier. I think we can add it later if it is needed.
> The concurent access to refcnt and sleepable_refcnt can be serialized
> with simple spin_lock.

OK.
>
> I also don't like
>> +       BPF_MAP_RCU_GP = BIT(0),
>> +       BPF_MAP_RCU_TT_GP = BIT(1),
> the names should not describe what action should be taken.
> The names should describe what the bits do.

Understood.
> I still think sleepable_refcnt and refcnt is more than enough to
> optimize patch 3.

Before posting v4,  do the following code-snippets match your suggestions ?

resolve_pseudo_ldimm64()
                        if (env->prog->aux->sleepable)
                               /* The max number of program is INT_MAX -
1, so atomic_t will be enough */
                                atomic_inc(&map->sleepable_refcnt);

bpf_map_fd_put_ptr()
        if (deferred) {
                if (atomic_read(&map->sleepable_refcnt))
                        WRITE_ONCE(map->free_after_tt_rcu_gp, true);
                else
                        WRITE_ONCE(map->free_after_rcu_gp, true);
        }

bpf_map_put()
                if (READ_ONCE(map->free_after_tt_rcu_gp))
                        call_rcu_tasks_trace(&map->rcu,
bpf_map_free_mult_rcu_gp);
                else if (READ_ONCE(map->free_after_rcu_gp))
                        call_rcu(&map->rcu, bpf_map_free_rcu_gp);
                else
                        bpf_map_free_in_work(map);

And are you OK with using call_rcu()/call_rcu_tasks_trace() instead of
synchronize_rcu()/synchronize_rcu_mult() ?





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux