Re: [PATCH nf-next 3/8] nf_tables: Add set type for arbitrary concatenation of ranges

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 20 Nov 2019 16:06:09 +0100
Florian Westphal <fw@xxxxxxxxx> wrote:

> Stefano Brivio <sbrivio@xxxxxxxxxx> wrote:
> > +static bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
> > +			      const u32 *key, const struct nft_set_ext **ext)
> > +{
> > +	struct nft_pipapo *priv = nft_set_priv(set);
> > +	unsigned long *res_map, *fill_map;
> > +	u8 genmask = nft_genmask_cur(net);
> > +	const u8 *rp = (const u8 *)key;
> > +	struct nft_pipapo_match *m;
> > +	struct nft_pipapo_field *f;
> > +	bool map_index;
> > +	int i;
> > +
> > +	map_index = raw_cpu_read(nft_pipapo_scratch_index);  
> 
> I'm afraid this will need local_bh_disable to prevent reentry from
> softirq processing.

I'm afraid you're right and yes, not just this: from here to the point
where we're done using scratch maps or their index. Adding in v2.

Well, at least vectorised versions for (at least) x86, ARM and s390x
won't have any overhead from it as they will already do that with
kernel_fpu_begin()/kernel_neon_begin().

> > +	rcu_read_lock();  
> 
> All netfilter hooks run inside rcu read section, so this isn't needed.

Dropping in v2.

> > +static int pipapo_realloc_scratch(struct nft_pipapo_match *m,
> > +				  unsigned long bsize_max)
> > +{
> > +	int i;
> > +
> > +	for_each_possible_cpu(i) {
> > +		unsigned long *scratch;
> > +
> > +		scratch = kzalloc_node(bsize_max * sizeof(*scratch) * 2,
> > +				       GFP_KERNEL, cpu_to_node(i));
> > +		if (!scratch)
> > +			return -ENOMEM;  
> 
> No need to handle partial failures on the other cpu / no rollback?
> AFAICS ->destroy will handle it correctly, i.e. next insertion may
> enter this again and allocate a same-sized chunk, so AFAICS its fine.

There's no need because this is just called on insertion, so the new
scratch maps will be bigger than the previous ones, and if only some
allocations here succeed, that means some CPUs have a bigger allocated
map, but the element is not inserted and the extra room is not used,
because the caller won't update m->bsize_max.

> But still, it looks odd -- perhaps add a comment that there is no need
> to rollback earlier allocs.

Sure, added.

> > +
> > +		kfree(*per_cpu_ptr(m->scratch, i));  
> 
> I was about to ask what would prevent nft_pipapo_lookup() from accessing
> m->scratch.  Its because "m" is the private clone.  Perhaps add a
> comment here to that effect.

I renamed 'm' to 'clone' and updated kerneldoc header, I think it's
even clearer than a comment that way.

> > + * @net:	Network namespace
> > + * @set:	nftables API set representation
> > + * @elem:	nftables API element representation containing key data
> > + * @flags:	If NFT_SET_ELEM_INTERVAL_END is passed, this is the end element
> > + * @ext2:	Filled with pointer to &struct nft_set_ext in inserted element
> > + *
> > + * In this set implementation, this functions needs to be called twice, with
> > + * start and end element, to obtain a valid entry insertion. Calls to this
> > + * function are serialised, so we can store element and key data on the first
> > + * call with start element, and use it on the second call once we get the end
> > + * element too.  
> 
> What guaranttess this?

Well, the only guarantee that I'm expecting here is that the insert
function is not called concurrently in the same namespace, and as far
as I understand that comes from nf_tables_valid_genid(). However:

> AFAICS userspace could send a single element, with either
> NFT_SET_ELEM_INTERVAL_END, or only the start element.

this is all possible, and:

- for a single element with NFT_SET_ELEM_INTERVAL_END, we'll reuse the
  last 'start' element ever seen, or an all-zero key if no 'start'
  elements were seen at all

- for a single 'start' element, no element is added

If the user chooses to configure firewalling with syzbot, my assumption
is that all we have to do is to avoid crashing or leaking anything.

We could opt to be stricter indeed, by checking that a single netlink
batch contains a corresponding number of start and end elements. This
can't be done by the insert function though, we don't have enough
context there.

A possible solution might be to implement a ->validate() callback
similar to what's done for chains -- or maybe export the context to
insert functions so that we can relate stuff to portid/seq.

Do you think it's worth it? In some sense, this should already be all
consistent and safe.

-- 
Stefano





[Index of Archives]     [Netfitler Users]     [Berkeley Packet Filter]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux