On 25 July 2018 at 18:50, bigeasy@xxxxxxxxxxxxx <bigeasy@xxxxxxxxxxxxx> wrote: > On 2018-07-25 11:54:53 [+0200], Ard Biesheuvel wrote: >> Indeed. OTOH, if the -rt people (Sebastian?) turn up and say that a >> 1000 cycle limit to the quantum of work performed with preemption >> disabled is unreasonably low, we can increase the yield block counts >> and approach the optimal numbers a bit closer. But with diminishing >> returns. > > So I tested on SoftIron Overdrive 1000 which has A57 cores. I added this > series and didn't notice any spikes. This means cyclictest reported a > max value of like ~20us (which means the crypto code was not > noticeable). > I played a little with it and tcrypt tests for aes/sha1 and also no huge > spikes. So at this point this looks fantastic. I also setup cryptsetup / > dm-crypt with the usual xts(aes) mode and saw no spikes. > At this point, on this hardware if you want to raise the block count, I > wouldn't mind. > > I remember on x86 the SIMD accelerated ciphers led to ~1ms+ spikes once > dm-crypt started its jobs. > Thanks a lot. So 20 us ~= 20,000 cycles on my 1 GHz Cortex-A53, and if I am understanding you correctly, you wouldn't mind the quantum of work to be in the order 16,000 cycles or even substantially more? That is good news, but it is also rather interesting, given that these algorithms run at ~4 cycles per byte, meaning that you'd manage an entire 4 KB page without ever yielding. (GCM is used on network packets, XTS on disk sectors which are all smaller than that) Do you remember how you found out NEON use is a problem for -rt on arm64 in the first place? Which algorithm did you test at the time to arrive at this conclusion? Note that AES-GCM using ordinary SIMD instructions runs at 29 cpb, and plain AES at ~20 (on A53), so perhaps it would make sense to distinguish between algos using crypto instructions and ones using plain SIMD.