Re: Numen with reference to vmap

Phil Sutter <phil@xxxxxx> · Wed, 4 Dec 2019 23:32:15 +0100

Hi Arturo,

On Wed, Dec 04, 2019 at 06:31:02PM +0100, Arturo Borrero Gonzalez wrote:
> On 12/4/19 4:56 PM, Phil Sutter wrote:
> > OK, static load-balancing between two services - no big deal. :)
> > 
> > What happens if config changes? I.e., if one of the endpoints goes down
> > or a third one is added? (That's the thing we're discussing right now,
> > aren't we?)
> 
> if the non-anon map for random numgen was allowed, then only elements would need
> to be adjusted:
> 
> dnat numgen random mod 100 map { 0-49 : 1.1.1.1, 50-99 : 2.2.2.2 }
> 
> You could always use mod 100 (or 10000 if you want) and just play with the map
> probabilities by updating map elements. This is a valid use case I think.
> The mod number can just be the max number of allowed endpoints per service in
> kubernetes.
> 
> @Phil,
> 
> I'm not sure if the typeof() thingy will work in this case, since the integer
> length would depend on the mod value used.
> What about introducing something like an explicit u128 integer datatype. Perhaps
> it's useful for other use cases too...

Out of curiosity I implemented the bits to support typeof keyword in
parser and scanner. It's a bit clumsy, but it works. I can do:

| nft add map t m2 '{ type typeof numgen random mod 2 : verdict; }'

(The 'random mod 2' part is ignored, but needed as otherwise it's not a
primary_expr. :D)

The output is:

| table ip t {
| 	map m2 {
| 		type integer : verdict
| 	}
| }

So integer size information is lost, this won't work when fed back.
There are two options to solve this:

A) Push expression info into kernel so we can correctly deserialize the
   original input.

B) As you suggested, have something like 'int32' or maybe better 'int(32)'.

I consider (B) to be way less ugly. And if we went that route, we could
actually use the 'int32'/'int(32)' thing in the first place. All users
have to know is how large is 'numgen' data type. Or we're even smart
here, taking into account that such a map may be used with different
inputs and mask input to fit map key size. IIRC, we may even have had
this discussion in an inconveniently cold room in Malaga once. :)

> @Serguei,
> 
> kubernetes implements a complex chain of mechanisms to deal with traffic. What
> happens if endpoints for a given svc have different ports? I don't know if
> that's supported or not, but then this approach wouldn't work either: you can't
> use dnat numgen randmo { 0-49 : <ip>:<port> }.
> 
> Also, we have the masquerade/drop thing going on too, which needs to be deal
> with and that currently is done by yet another chain jump + packet mark.
> 
> I'm not sure in which state of the development you are, but this is my
> suggestion: Try to don't over-optimize in the first iteration. Just get a
> working nft ruleset with the few optimization that make sense and are easy to
> use (and understand). For iteration #2 we can do better optimizations, including
> patching missing features we may have in nftables.
> I really want a ruleset with very little rules, but we are still comparing with
> the iptables ruleset. I suggest we leave the hard optimization for a later point
> when we are comparing nft vs nft rulesets.

+1 for optimize not (yet). At least there's a certain chance that we're
spending much effort into optimizing a path which isn't even the
bottleneck later.

Cheers, Phil