On 2018-07-26 09:25:40 [+0200], Ard Biesheuvel wrote: > Thanks a lot. > > So 20 us ~= 20,000 cycles on my 1 GHz Cortex-A53, and if I am > understanding you correctly, you wouldn't mind the quantum of work to > be in the order 16,000 cycles or even substantially more? I have currently that one box and it does not seem to be a problem. So it reports now on idle around 20us max. So if add "only" 20us to NEON / your preempt-disable section then we may end up at 20+20 = 40us. At this point I am not sure how "bad" it is. It works, it does not seem that much and you can disable it if you don't want the extra 20us here. > That is good news, but it is also rather interesting, given that these > algorithms run at ~4 cycles per byte, meaning that you'd manage an > entire 4 KB page without ever yielding. (GCM is used on network > packets, XTS on disk sectors which are all smaller than that) > > Do you remember how you found out NEON use is a problem for -rt on > arm64 in the first place? Which algorithm did you test at the time to > arrive at this conclusion? I *think* that yield got in there by chance. The main problem was back at the time that within the neon begin/end section there was the scatter list walk. That walk may invoke kmap() / kmalloc() / kfree() and is not allowed on RT within a preempt-disable section. This was my main concern. > Note that AES-GCM using ordinary SIMD instructions runs at 29 cpb, and > plain AES at ~20 (on A53), so perhaps it would make sense to > distinguish between algos using crypto instructions and ones using > plain SIMD. I was looking at AES-CE and AES-NEON (aes-neon-blk / aes_ce_blk) with modprobe tcrypt mode=200 sec=1 and mode=403 +404 for the sha1/256 test. Sebastian