On Tue, May 29, 2018 at 7:45 PM, Xin Long <lucien.xin@xxxxxxxxx> wrote: > On Wed, May 30, 2018 at 1:06 AM, Marcelo Ricardo Leitner > <marcelo.leitner@xxxxxxxxx> wrote: >> On Tue, May 29, 2018 at 12:03:46PM -0400, Neal Cardwell wrote: >>> On Tue, May 29, 2018 at 11:45 AM Marcelo Ricardo Leitner < >>> marcelo.leitner@xxxxxxxxx> wrote: >>> > - patch2 - fix rtx attack vector >>> > - Add the floor value to rto_min to HZ/20 (which fits the values >>> > that Michael shared on the other email) >>> >>> I would encourage allowing minimum RTO values down to 5ms, if the ACK >>> policy in the receiver makes this feasible. Our experience is that in >>> datacenter environments it can be advantageous to allow timer-based loss >>> recoveries using timeout values as low as 5ms, e.g.: >> >> Thanks Neal. On Xin's tests, the hearbeat timer becomes an issue at >> ~25ms already. Xin, can you share more details on the hw, which CPU >> was used? Hi, Did we reach any decision on this? This continues to produce bug reports on syzbot. I am not sure whom you are asking, because Xin is you unless I am missing something :) But if you mean syzbot hardware, then it's GCE VMs with modern Intel CPUs but an important aspect is a heavy-debug config (which you can take from here https://syzkaller.appspot.com/bug?extid=3dcd59a1f907245f891f) and systematic bug reporting. So if it's any flaky in your testing, it will produce dozens of bug emails on syzbot. > It was on a KVM guest, "-smp 2,cores=1,threads=1,sockets=2" > # lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 2 > On-line CPU(s) list: 0,1 > Thread(s) per core: 1 > Core(s) per socket: 1 > Socket(s): 2 > NUMA node(s): 1 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 13 > Model name: QEMU Virtual CPU version 1.5.3 > Stepping: 3 > CPU MHz: 2397.222 > BogoMIPS: 4794.44 > Hypervisor vendor: KVM > Virtualization type: full > L1d cache: 32K > L1i cache: 32K > L2 cache: 4096K > NUMA node0 CPU(s): 0,1 > Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr > pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good > nopl cpuid pni cx16 hypervisor lahf_lm abm pti > > If we're counting on max_t to fix this CPU stuck. It should not that > matter if min rto < the value causing that stuck. > >> >> Anyway, what about we add a floor to rto_max too, so that RTO can >> actually grow into something bigger that don't hog the CPU? Like: >> rto_min floor = 5ms >> rto_max floor = 50ms >> >> Marcelo -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html