Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/27, Jakub Sitnicki wrote:
On Wed, May 27, 2020 at 07:48 PM CEST, sdf@xxxxxxxxxx wrote:
> On 05/27, Jakub Sitnicki wrote:
>> Add support for bpf() syscall subcommands that operate on
>> bpf_link (LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO) for attach points tied to
>> network namespaces (that is flow dissector at the moment).
>
>> Link-based and prog-based attachment can be used interchangeably, but only
>> one can be in use at a time. Attempts to attach a link when a prog is
>> already attached directly, and the other way around, will be met with
>> -EBUSY.
>
>> Attachment of multiple links of same attach type to one netns is not
>> supported, with the intention to lift it when a use-case presents
>> itself. Because of that attempts to create a netns link, when one already >> exists result in -E2BIG error, signifying that there is no space left for
>> another attachment.
>
>> Link-based attachments to netns don't keep a netns alive by holding a ref >> to it. Instead links get auto-detached from netns when the latter is being
>> destroyed by a pernet pre_exit callback.
>
>> When auto-detached, link lives in defunct state as long there are open FDs
>> for it. -ENOLINK is returned if a user tries to update a defunct link.
>
>> Because bpf_link to netns doesn't hold a ref to struct net, special care is >> taken when releasing the link. The netns might be getting torn down when
>> the release function tries to access it to detach the link.
>
>> To ensure the struct net object is alive when release function accesses it
>> we rely on the fact that cleanup_net(), struct net destructor, calls
>> synchronize_rcu() after invoking pre_exit callbacks. If auto-detach from >> pre_exit happens first, link release will not attempt to access struct net.
>
>> Same applies the other way around, network namespace doesn't keep an
>> attached link alive because by not holding a ref to it. Instead bpf_links >> to netns are RCU-freed, so that pernet pre_exit callback can safely access
>> and auto-detach the link when racing with link release/free.
>
> [..]
>> +	rcu_read_lock();
>>   	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
>> -		if (rcu_access_pointer(net->bpf.progs[type]))
>> +		if (rcu_access_pointer(net->bpf.links[type]))
>> +			bpf_netns_link_auto_detach(net, type);
>> +		else if (rcu_access_pointer(net->bpf.progs[type]))
>>   			__netns_bpf_prog_detach(net, type);
>>   	}
>> +	rcu_read_unlock();
> Aren't you doing RCU_INIT_POINTER in __netns_bpf_prog_detach?
> Is it allowed under rcu_read_load?

Yes, that's true. __netns_bpf_prog_detach does

	RCU_INIT_POINTER(net->bpf.progs[type], NULL);

RCU read lock is here for the rcu_dereference() that happens in
bpf_netns_link_auto_detach (netns doesn't hold a ref to bpf_link):

/* Called with RCU read lock. */
static void __net_exit
bpf_netns_link_auto_detach(struct net *net, enum netns_bpf_attach_type type)
{
	struct bpf_netns_link *net_link;
	struct bpf_link *link;

	link = rcu_dereference(net->bpf.links[type]);
	if (!link)
		return;
	net_link = to_bpf_netns_link(link);
	RCU_INIT_POINTER(net_link->net, NULL);
}

I've pulled it up, out of the loop, perhaps too eagerly and just made it
confusing, considering we're iterating over a 1-item array :-)

Now, I'm also doing RCU_INIT_POINTER on the *contents of bpf_link* in
bpf_netns_link_auto_detach. Is that allowed? I'm not sure, that bit is
hazy to me.

There are no concurrent writers to net_link->net, just readers, i.e.
bpf_netns_link_release(). And I know bpf_link won't be freed until the
grace period elapses.

sparse and CONFIG_PROVE_RCU are not shouting at me, but please do if it
doesn't hold up or make sense.

I certainly can push the rcu_read_lock() down into
bpf_netns_link_auto_detach().
I think it would be much nicer if you push them down to preserve the
assumption that nothing is modified under read lock and you flip
the pointers only when holding the mutexes.

I'll do another pass on this patch, I think I don't understand a bunch
of bits where you do:

mutex_lock
rcu_read_lock -> why? you're already in the update section, can use
                 rcu_dereference_protected
...
rcu_read_unlock
mutex_unlock


But I'll post those comments inline shortly.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux