On Thu, Sep 27, 2018 at 6:27 PM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > I would add another consideration: if you can get better latency with negligible overhead (0.1%? 0.05%), then that might make sense too. For example, it seems plausible that checking need_resched() every few blocks adds basically no overhead, and the SIMD helpers could do this themselves or perhaps only ever do a block at a time. > > need_resched() costs a cacheline access, but it’s usually a hot cacheline, and the actual check is just whether a certain bit in memory is set. Yes you're right, I do plan to check quite often, rather than seldom, for this reason. I've been toying with the idea of instead processing 65k (maximum size of a UDP packet) at a time before checking need_resched(), but armed with the 20µs figure, this isn't remotely possible on most hardware. So I'll stick with the original conservative plan of checking very often, and not making things different from the aspects worked out by the present crypto API in this regard.