On Mon, Oct 16, 2023 at 12:46 PM Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> wrote: > > From: Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> > Date: Mon, 16 Oct 2023 10:19:18 -0400 > > Kuniyuki Iwashima wrote: > > > Under SYN Flood, the TCP stack generates SYN Cookie to remain stateless > > > for the connection request until a valid ACK is responded to the SYN+ACK. > > > > > > The cookie contains two kinds of host-specific bits, a timestamp and > > > secrets, so only can it be validated by the generator. It means SYN > > > Cookie consumes network resources between the client and the server; > > > intermediate nodes must remember which nodes to route ACK for the cookie. > > > > > > SYN Proxy reduces such unwanted resource allocation by handling 3WHS at > > > the edge network. After SYN Proxy completes 3WHS, it forwards SYN to the > > > backend server and completes another 3WHS. However, since the server's > > > ISN differs from the cookie, the proxy must manage the ISN mappings and > > > fix up SEQ/ACK numbers in every packet for each connection. If a proxy > > > node is down, all the connections through it are also down. Keeping a > > > state at proxy is painful from that perspective. > > > > > > At AWS, we use a dirty hack to build truly stateless SYN Proxy at scale. > > > Our SYN Proxy consists of the front proxy layer and the backend kernel > > > module. (See slides of netconf [0], p6 - p15) > > > > > > The cookie that SYN Proxy generates differs from the kernel's cookie in > > > that it contains a secret (called rolling salt) (i) shared by all the proxy > > > nodes so that any node can validate ACK and (ii) updated periodically so > > > that old cookies cannot be validated. Also, ISN contains WScale, SACK, and > > > ECN, not in TS val. This is not to sacrifice any connection quality, where > > > some customers turn off the timestamp option due to retro CVE. > > > > If easier: I think it should be possible to make the host secret > > readable and writable with CAP_NET_ADMIN, to allow synchronizing > > between hosts. > > I think the idea is doable for syncookie_secret and syncookie6_secret. > However, the cookie timestamp is generated based on jiffies that cannot > be written. > > [ I answered sharing secrets would resolve our issue at netconf, but > I was wrong. ] > > > > For similar reasons as suggested here, a rolling salt might be > > useful more broadly too. > > Maybe we need not use jiffies and can create a worker to update the > secret periodically if it's not configured manually. > > The problem here would be that we need to update/read u64[4] atomically > if we want to use SipHash or HSipHash. Maybe this also can be changed. > > But, we still want to use BPF as we need to encode (at least) WS and > SACK bits in ISN, not TS and use different MSS candidates rather than > msstab. > > Also, in our use case, the validation for cookie itself is done in > the front proxy layer, and the kernel will do more light-weight > validation like checking if the cookie is forwarded from trusted > nodes. Then, we can prevent invalid ACK from flowing through the > backend and consuming some networking entries, and the backend need > not do full validation. > > With BPF, we can get such flexibility at encoding and validation, and > making cookie generation algorithm private could be good for security. Thanks for that context. Sounds like it indeed would not be a small change to support your use case in the existing syncookie C code, then.