I'm using GCC 8.1 (built myself) on GNU/Linux for this testing. The code is compiled with -O2 -g. I replaced the arch-specific implementation of a function we use a LOT which returns a value in microseconds that we compare to get elapsed time (but never convert into a timestamp) with std::chrono like this: using std::chrono; return time_point_cast<microseconds>(steady_clock::now()) .time_since_epoch().count(); Unfortunately after I did this I got pinged by the perf folks that I had introduced a 24% slowdown in some benchmarks. These benchmarks are testing general application operations, and they compute elapsed time using a Python framework so that computation is not impacted by the above change. The slowdown is widespread across many tests. I can reproduce this slowdown on my local system. I've tried it with both steady_clock and system_clock and both show the same slowdown. I looked at the code in libstdc++-v3 and as far as I can tell I'm using CLOCK_MONOTONIC for my steady_clock; this is in libstdc++-v3/config.h: #define _GLIBCXX_USE_CLOCK_GETTIME_SYSCALL 1 #define _GLIBCXX_USE_CLOCK_MONOTONIC 1 #define _GLIBCXX_USE_CLOCK_REALTIME 1 But if I replace the above code with a direct call to clock_gettime(): struct timespec ts; clock_gettime(CLOCK_MONOTONIC, &ts); return ts.tv_sec * 1000000ULL + ts.tv_nsec / 1000; then I get back all my performance! Any idea why the std::chrono implementation is so much slower? I looked at the implementation and I didn't see anything obvious. Here's a gdb disassembly of the "fast" clock_gettime() version: <+0>: push %rbp <+1>: mov $0x1,%edi <+6>: mov %rsp,%rbp <+9>: lea -0x10(%rsp),%rsp <+14>: lea -0x10(%rbp),%rsi <+18>: callq 0x40c730 <clock_gettime@plt> <+23>: mov -0x8(%rbp),%rcx <+27>: movabs $0x20c49ba5e353f7cf,%rdx <+37>: imul $0xf4240,-0x10(%rbp),%rsi <+45>: mov %rcx,%rax <+48>: imul %rdx <+51>: sar $0x3f,%rcx <+55>: leaveq <+56>: sar $0x7,%rdx <+60>: sub %rcx,%rdx <+63>: lea (%rsi,%rdx,1),%rax <+67>: retq xchg %ax,%ax nopw %cs:0x0(%rax,%rax,1) And here's a disassembly of the slow chrono version: <+0>: push %rbp <+1>: mov %rsp,%rbp <+4>: callq 0xe62720 <_ZNSt6chrono3_V212steady_clock3nowEv> <+9>: movabs $0x20c49ba5e353f7cf,%rdx <+19>: mov %rax,%rcx <+22>: imul %rdx <+25>: sar $0x3f,%rcx <+29>: pop %rbp <+30>: sar $0x7,%rdx <+34>: mov %rdx,%rax <+37>: sub %rcx,%rax <+40>: retq nop nopw 0x0(%rax,%rax,1) And of the std::chrono::steady_clock::now() function: Dump of assembler code for function _ZNSt6chrono3_V212steady_clock3nowEv: <+0>: sub $0x18,%rsp <+4>: mov $0x1,%esi <+9>: mov $0xe4,%edi <+14>: xor %eax,%eax <+16>: mov %rsp,%rdx <+19>: callq 0x40b950 <syscall@plt> <+24>: imul $0x3b9aca00,(%rsp),%rax <+32>: add 0x8(%rsp),%rax <+37>: add $0x18,%rsp <+41>: retq My assembly is pretty lame, so maybe I'm missing something, but it seems pretty close to the same thing to me, and does appear to be using CLOCK_MONOTONIC (which is code 1) in the syscall.