Re: [PATCH 00/31] net/tcp: Add TCP-AO support

Dmitry Safonov <dima@xxxxxxxxxx> · Tue, 23 Aug 2022 17:31:22 +0100

On 8/23/22 16:30, Leonard Crestez wrote:
> On 8/22/22 23:35, Dmitry Safonov wrote:
>> Hi Leonard, David,
[..]
>> At this particular moment, it seems neither of patch sets is ready to be
>> merged "as-is". But it seems that there's enough interest from both
>> sides and likely it guarantees that there will be enough effort to make
>> something merge-able, that will work for all interested parties.
>>
>> As for my part, I'm interested in the best code upstream, regardless who
>> is the author. This includes:
>> - reusing the existing TCP-MD5 code, rather than copying'n'pasting for
>>    TCP-AO with intent to refactor it some day later
> 
> I had a requirement to deploy on linux 5.4 so I very deliberately
> avoided touching MD5. I'm not sure there very much duplication anyway.

Yeah, I know what you mean: we deployed it on v4.19. But for the code
upstream I personally prefer to see "reusing" rather than copying.
Lesser code is easier to maintain in future.
Upstream submissions in my view should be based on "what would be easier
to maintain in future", rather than on "what would be easier to backport
to my maintenance release".

>> - making setsockopt()s and other syscalls extendable
>> - cover functionality with selftests
> 
> My implementation is tested with a standalone python package, this is a
> design choice which doesn't particularly matter.
> 
>> - following RFC5925 in implementation, especially "required" and "must"
>>    parts
> 
> I'm not convinced that "don't delete current key" needs to be literally
> interpreted as a hard requirement for the linux ABI. Most TCP RFCs don't
> specify any sort of API at all and it would be entirely valid to
> implement BGP-TCP-AO as a single executable with no internally
> documented boundaries.

I agree that RFC requirements and "musts" can be implemented in
userspace, rather than in kernel. On the other hand, my opinion is that
if you have "must"/"must not"/"required" in RFC and it's not hard to
limit those in kernel, than you _should_ do it.
In this point of view, debugging "hey, setsockopt() for key removal
returned -EBUSY, what's going on?" is better than "hey, tcp connection
died on my side and I didn't have tcp dump running, what was that?".

>> I hope that clarifies how and why now there are two patch sets that
>> implement the same RFC/functionality.
> 
> As far as I can tell the biggest problem is that is quite difficult to
> implement the userspace side of TCP-AO complete with key rollover. Our
> two implementation both claim to support this but through different ABI
> and both require active management from userspace.
> 
> I think it would make sense to push key validity times and the key
> selection policy entirely in the kernel so that it can handle key
> rotation/expiration by itself. This way userspace only has to configure
> the keys and doesn't have to touch established connections at all.

Respectfully I disagree here. I think all such policies should be
implemented in userspace. The kernel has to have as lesser as possible,
but enough to provide a way to sign, verify, log messages on TCP segments.
All the logic that may change, all business decisions and key management
should be implemented in userspace, keeping the kernel part as easier in
"KISS" sense as possible.

> My series has a "flags" field on the key struct where it can filter by
> IP, prefix, ifindex and so on. It would be possible to add additional
> flags for making the key only valid between certain times (by wall time).
> 
> The kernel could then make key selections itself:
>  - send rnextkeyid based on the key with the longest recv lifetime
>  - send keyid based on remote rnextkeyid.
>    - If not applicable (rnextkeyid not found locally, or for SYN) pick
> based on longest send lifetime.
>  - If all keys expire then return an error on write()
>  - Solve other ambiguities in a predictable way: if valid times are
> equal then pick the lowest numeric send_id or recv_id.
> 
> Explicit key selection from userspace could still be supported but it
> would be optional and most apps wouldn't bother implementing their own
> policy. The biggest advantage is that it would be much easier for
> applications to adopt TCP-AO.

Personally, I would think that all you mentioned better stay in
userspace app. The kernel should do as minimal and as much predictable
as possible job here, without 10 possible outcomes. If you want to share
the logic of key rotation/expiration, all timers and synchronization
between different BGP applications, that would be a proper job for a
shared library.

Thanks,
          Dmitry