Re: [PATCH] libuuid: use kernel crypto api

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Aug 05, 2018 at 11:42:09AM +0100, Sami Kerola wrote:
> 
> I should have told in that commit message part of the motivation was to 
> deprecate util-linux local md5 implementation. But since both of you 
> raised concern about performance I decided to test kernel api and 
> util-linux implementations as close the same way as they are used in 
> libuuid.
> 
> Executive summary: kernel api is surprisingly slow.

 You're probably testing on an x86-64 system with kernel mitigation
for Spectre and Meltdown.

Both of those add *significant* overhead to every system call (or
other kernel entry/exit, like interrupts).

e.g. in comments on Stack Overflow, @BeeOnRope found that a `syscall`
instruction with an invalid call number takes about 1800 cycles on a
Skylake CPU running Linux (in late February 2018).
https://stackoverflow.com/questions/48913091/fastest-linux-system-call#comment84843442_48914200

(Unfortunately IDK if there's a better / more details analysis of
system call costs anywhere.)

Most of that cost is in the WRMSR that flushes branch predictors,
using Intel's newly-introduced (and *not* fast) microcode assistance
for Spectre.  Possibly future hardware will make this cheaper, but on
current hardware it just sucks to make system calls.

Thanks to Meltdown mitigation, you get extra TLB misses in the kernel
and after returning to user-space.  (This may be less bad than in
early patches, thanks to using hardware PCIDs).  But even just the MOV
to CR3 to change the top level page table takes some time.

I'm not surprised that you found a 10x slowdown for short messages.  
Amortizing the kernel entry/exit over a larger buffer is the only way
for it not to be horrible.

If you're curious, you could try booting with the workarounds disabled
(or an old kernel) to see how much perf difference that makes.  The
SYSCALL / SYSRET instructions themselves only take something in the
ballpark of 50 cycles on Skylake or Ryzen, IIRC, and Linux's
system-call dispatch code is pretty efficient for the fast path.  Even
that might still be measurable overhead for MD5 on a short message,
though.

-- 
#define X(x,y) x##y
Peter Cordes ;  e-mail: X(peter@cor , des.ca)

"The gods confound the man who first found out how to distinguish the hours!
 Confound him, too, who in this place set up a sundial, to cut and hack
 my day so wretchedly into small pieces!" -- Plautus, 200 BC
--
To unsubscribe from this list: send the line "unsubscribe util-linux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux