Re: [PATCH net-next v6 07/23] zinc: ChaCha20 ARM and ARM64 implementations

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Thu, 27 Sep 2018 09:26:59 -0700

> On Sep 27, 2018, at 8:19 AM, Jason A. Donenfeld <Jason@xxxxxxxxx> wrote:
> 
> Hey again Thomas,
> 
>> On Thu, Sep 27, 2018 at 3:26 PM Jason A. Donenfeld <Jason@xxxxxxxxx> wrote:
>> 
>> Hi Thomas,
>> 
>> I'm trying to optimize this for crypto performance while still taking
>> into account preemption concerns. I'm having a bit of trouble figuring
>> out a way to determine numerically what the upper bounds for this
>> stuff looks like. I'm sure I could pick a pretty sane number that's
>> arguably okay -- and way under the limit -- but I still am interested
>> in determining what that limit actually is. I was hoping there'd be a
>> debugging option called, "warn if preemption is disabled for too
>> long", or something, but I couldn't find anything like that. I'm also
>> not quite sure what the latency limits are, to just compute this with
>> a formula. Essentially what I'm trying to determine is:
>> 
>> preempt_disable();
>> asm volatile(".fill N, 1, 0x90;");
>> preempt_enable();
>> 
>> What is the maximum value of N for which the above is okay? What
>> technique would you generally use in measuring this?
>> 
>> Thanks,
>> Jason
> 
> From talking to Peter (now CC'd) on IRC, it sounds like what you're
> mostly interested in is clocktime latency on reasonable hardware, with
> a goal of around ~20µs as a maximum upper bound? I don't expect to get
> anywhere near this value at all, but if you can confirm that's a
> decent ballpark, it would make for some interesting calculations.
> 
> 

I would add another consideration: if you can get better latency with negligible overhead (0.1%? 0.05%), then that might make sense too. For example, it seems plausible that checking need_resched() every few blocks adds basically no overhead, and the SIMD helpers could do this themselves or perhaps only ever do a block at a time.

need_resched() costs a cacheline access, but it’s usually a hot cacheline, and the actual check is just whether a certain bit in memory is set.