On Thu, 30 Jan 2020 09:02:08 -0800 Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > On Thu, Jan 30, 2020 at 4:41 AM <sjpark@xxxxxxxxxx> wrote: > > > > On Wed, 29 Jan 2020 09:52:43 -0800 Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > > > > On Wed, Jan 29, 2020 at 9:14 AM <sjpark@xxxxxxxxxx> wrote: > > > > > > > > Hello, > > > > > > > > > > > > We found races in the kernel code that incur latency spikes. We thus would > > > > like to share our investigations and hear your opinions. > > > > [...] > > > > > > I would rather try to fix the issue more generically, without adding > > > extra lookups as you did, since they might appear > > > to reduce the race, but not completely fix it. > > > > > > For example, the fact that the client side ignores the RST and > > > retransmits a SYN after one second might be something that should be > > > fixed. > > > > I also agree with this direction. It seems detecting this situation and > > adjusting the return value of tcp_timeout_init() to a value much lower than the > > one second would be a straightforward solution. For a test, I modified the > > function to return 1 (4ms for CONFIG_HZ=250) and confirmed the reproducer be > > silent. My following question is, how we can detect this situation in kernel? > > However, I'm unsure how we can distinguish this specific case from other cases, > > as everything is working as normal according to the TCP protocol. > > > > Also, it seems the value is made to be adjustable from the user space using the > > bpf callback, BPF_SOCK_OPS_TIMEOUT_INIT: > > > > BPF_SOCK_OPS_TIMEOUT_INIT, /* Should return SYN-RTO value to use or > > * -1 if default value should be used > > */ > > > > Thus, it sounds like you are suggesting to do the detection and adjustment from > > user space. Am I understanding your point? If not, please let me know. > > > > No, I was suggesting to implement a mitigation in the kernel : > > When in SYN_SENT state, receiving an suspicious ACK should not > simply trigger a RST. > > There are multiple ways maybe to address the issue. > > 1) Abort the SYN_SENT state and let user space receive an error to its > connect() immediately. > > 2) Instead of a RST, allow the first SYN retransmit to happen immediately > (This is kind of a challenge SYN. Kernel already implements challenge acks) > > 3) After RST is sent (to hopefully clear the state of the remote), > schedule a SYN rtx in a few ms, > instead of ~ one second. Thank you for this kind comment, Eric! I would prefer the second and third idea rather than first one. Anyway, I will send a patch soon. Will add a kselftest for this case, too. Thanks, SeongJae Park [...]