Re: [PATCH bpf v2 1/3] bpf: Pin iterator link when opening iterator

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Wed, 16 Nov 2022 15:24:44 +0800



Hi,

On 11/16/2022 10:48 AM, Hao Luo wrote:
> On Tue, Nov 15, 2022 at 5:37 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
>> On Tue, Nov 15, 2022 at 11:16 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>>> On 11/10/22 10:34 PM, Hou Tao wrote:
>>>> From: Hou Tao <houtao1@xxxxxxxxxx>
>>>>
>>>> For many bpf iterator (e.g., cgroup iterator), iterator link acquires
>>>> the reference of iteration target in .attach_target(), but iterator link
>>>> may be closed before or in the middle of iteration, so iterator will
>>>> need to acquire the reference of iteration target as well to prevent
>>>> potential use-after-free. To avoid doing the acquisition in
>>>> .init_seq_private() for each iterator type, just pin iterator link in
>>>> iterator.
>>> iiuc, a link currently will go away when all its fds closed and pinned file
>>> removed.  After this change, the link will stay until the last iter is closed().
>>>   Before then, the user space can still "bpftool link show" and even get the
>>> link back by bpf_link_get_fd_by_id().  If this is the case, it would be useful
>>> to explain it in the commit message.
>>>
>>> and does this new behavior make sense when comparing with other link types?
> I think this is a unique problem in iter link. Because iter link is
> the only link type that can generate another object.
>
>> One more question to the above...
>>
>> Does this change mean that pinned cgroup iterator in bpffs
>> would prevent cgroup removal?
> Yes, when attaching the program to cgroup, the cgroup iter link gets
> an extra ref of the cgroup. It puts that ref when detach.
It seems we can not move the pinning of cgroup into cgroup_iter_seq_init(),
because the representation of the cgroup could be a fd and the fd will not be
accessible when iterator link is pinned in bpffs.
>
>> So that cgroup cannot even become a dying cgroup ?
>>
> No. The cgroup will become offline and its corresponding kernfs node
> will be removed. The cgroup object is still accessible.
>
>> If so we can do that and an approach similar to init_seq_private
>> taken for map iterators is necessary here as well.
>>
>> Also pls target this kind of change to bpf-next especially
>> when there is a consideration to revert other fixes.
>> This kind of questionable fixes are not suitable for bpf tree
>> regardless of how long the "bug" was present.
> .