On Fri, Jan 05, 2018 at 07:02:35PM +0100, Greg KH wrote: > On Fri, Jan 05, 2018 at 04:55:07PM +0100, Willy Tarreau wrote: > > On Fri, Jan 05, 2018 at 03:54:33PM +0100, Greg KH wrote: > > > I'm announcing the release of the 4.4.110 kernel. > > > > > > All users of the 4.4 kernel series must upgrade. > > > > > > But be careful, there have been some reports of problems with this > > > release during the -rc review cycle. Hopefully all of those issues are > > > now resolved. > > > > > > So please test, as of right now, it should be "bug compatible" with the > > > "enterprise" kernel releases with regards to the Meltdown bug and proper > > > support on all virtual platforms (meaning there is still a vdso issue > > > that might trip up some old binaries, again, please test!) > > > > > > If anyone has any problems, please let me know. > > > > FWIW I've just booted one of our LBs on it and am hammering it at full > > load with pti enabled and will let it run for the week-end. It takes > > 860k irq/s and about 1.7M syscalls/s. For now it works well (but slowly). > > Hopefully if there are any rare race conditions left it has a chance to > > trigger them. > > Thanks for the testing, let me know if you see anything. Definitely! For now zero error after almost one billion connections and around 15 billion syscalls and 7B irqs. > And "slowly", does that mean it is noticable? It depends by whom :-) We benchmarked this machine a while ago at 93k connections per second on 4.9 on a single process and now I'm seeing about 60k for a single process. I don't want to digress too much about numbers now as the test conditions certainly differ a bit, I'll have to rerun more detailed ones later. For 99.9% of the users it will not be noticeable. Those having to fight DDoS will certainly notice it. I'm pretty sure we'll run with pti=off at least at the beginning. > I have some querys from the virtual > networking people that are getting worried about all of this. I told > them to go test, but they were having a hard time finding a kernel to > test with. Hopefully we hear back from them now that these are out... I've tested and found a 40% perf drop on networking under KVM between pti=off and pti=on :-( Fortunately in our case, people running in VMs are not those interested in performance (that's commonly the case) but I expect it willy impact some high-performance users who tune their VMs very precisely. I'm currently testing a completely different approach for systems like these running basically a single task. The idea is to limit rdtsc to privileged processes only. I just discovered that my libc happily uses it in the ld.so so that limits my capabilities for now :-) But implementing an emulator could solve this for non-privileged processes, masking the lower bits and losing precision. It would not be a fix but an acceptable mitigation solution for some environments where pti=off is too expensive and where untrusted users are extremely rare (ie: just the remote cron job check disk space and collecting network stats). I already tested the variant of the spectre poc without rdtsc (using a thread and a counter) and it definitely is not something reasonably usable to steal reliable information anymore, I managed to get around 1/10 byte OK, but you never know which one. For this reason, people considering pti=off as the only solution might sometimes prefer this one as a small improvement (and it could also stop other classes of future attacks, maybe something for KSPP later). I'll continue to investigate and share my observations. Have a nice week-end! Willy