On Wed, Jun 20, 2018 at 9:35 PM, Arnd Bergmann <arnd@xxxxxxxx> wrote: > On Wed, Jun 20, 2018 at 6:19 PM, Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote: >> Arnd Bergmann <arnd@xxxxxxxx> writes: >>> >>> To clarify: current_kernel_time() uses at most millisecond resolution rather >>> than microsecond, as tkr_mono.xtime_nsec only gets updated during the >>> timer tick. >> >> Ah you're right. I remember now: the motivation was to make sure there >> is basically no overhead. In some setups the full gtod can be rather >> slow, particularly if it falls back to some crappy timer. > > This means, we're probably fine with a compile-time option that > distros can choose to enable depending on what classes of hardware > they are targetting, like > > struct timespec64 current_time(struct inode *inode) > { > struct timespec64 now; > u64 gran = inode->i_sb->s_time_gran; > > if (IS_ENABLED(CONFIG_HIRES_INODE_TIMES) && > gran <= NSEC_PER_JIFFY) > ktime_get_real_ts64(&now); > else > ktime_get_coarse_real_ts64(&now); > > return timespec64_trunc(now, gran); > } > > With that implementation, we could still let file systems choose > to get coarse timestamps by tuning the granularity in the > superblock s_time_gran, which would result in nice round > tv_nsec values that represent the actual accuracy. I've done some simple tests and found that on a variety of x86, arm32 and arm64 CPUs, it takes between 70 and 100 CPU cycles to read the TSC and add it to the coarse clock, e.g. on a 3.1GHz Ryzen, using the little test program below: vdso hires: 37.18ns vdso coarse: 6.44ns sysc hires: 161.62ns sysc coarse: 133.87ns On the same machine, it takes around 400ns (1240 cycles) to write one byte into a tmpfs file with pwrite(). Adding 5% to 10% overhead for accurate timestamps would definitely be noticed, so I guess we wouldn't enable that unconditionally, but could do it as an opt-in mount option if someone had a use case. Arnd --- /* measure times for high-resolution clocksource access from userspace */ #include <stdio.h> #include <time.h> #include <unistd.h> #include <stdbool.h> #include <sys/syscall.h> static int do_clock_gettime(clockid_t clkid, struct timespec *tp, bool vdso) { if (vdso) return clock_gettime(clkid, tp); return syscall(__NR_clock_gettime, clkid, tp); } static int loop1sec(int clkid, bool vdso) { int i; struct timespec t, start; do_clock_gettime(clkid, &start, vdso); i = 0; do { do_clock_gettime(clkid, &t, vdso); i++; } while (t.tv_sec == start.tv_sec || t.tv_nsec < start.tv_nsec); return i; } int main(void) { printf("vdso hires: %7.2fns\n", 1000000000.0 / loop1sec(CLOCK_REALTIME, true)); printf("vdso coarse: %7.2fns\n", 1000000000.0 / loop1sec(CLOCK_REALTIME_COARSE, true)); printf("sysc hires: %7.2fns\n", 1000000000.0 / loop1sec(CLOCK_REALTIME, false)); printf("sysc coarse: %7.2fns\n", 1000000000.0 / loop1sec(CLOCK_REALTIME_COARSE, false)); return 0; }