Re: [PATCH] vfs: replace current_kernel_time64 with ktime equivalent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 20, 2018 at 9:35 PM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> On Wed, Jun 20, 2018 at 6:19 PM, Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:
>> Arnd Bergmann <arnd@xxxxxxxx> writes:
>>>
>>> To clarify: current_kernel_time() uses at most millisecond resolution rather
>>> than microsecond, as tkr_mono.xtime_nsec only gets updated during the
>>> timer tick.
>>
>> Ah you're right. I remember now: the motivation was to make sure there
>> is basically no overhead. In some setups the full gtod can be rather
>> slow, particularly if it falls back to some crappy timer.
>
> This means, we're probably fine with a compile-time option that
> distros can choose to enable depending on what classes of hardware
> they are targetting, like
>
> struct timespec64 current_time(struct inode *inode)
> {
>         struct timespec64 now;
>         u64 gran = inode->i_sb->s_time_gran;
>
>         if (IS_ENABLED(CONFIG_HIRES_INODE_TIMES) &&
>             gran <= NSEC_PER_JIFFY)
>                   ktime_get_real_ts64(&now);
>         else
>                   ktime_get_coarse_real_ts64(&now);
>
>         return timespec64_trunc(now, gran);
> }
>
> With that implementation, we could still let file systems choose
> to get coarse timestamps by tuning the granularity in the
> superblock s_time_gran, which would result in nice round
> tv_nsec values that represent the actual accuracy.

I've done some simple tests and found that on a variety of
x86, arm32 and arm64 CPUs, it takes between 70 and 100
CPU cycles to read the TSC and add it to the coarse
clock, e.g. on a 3.1GHz Ryzen, using the little test program
below:

vdso hires:   37.18ns
vdso coarse:    6.44ns
sysc hires: 161.62ns
sysc coarse: 133.87ns

On the same machine, it takes around 400ns (1240 cycles)
to write one byte into a tmpfs file with pwrite(). Adding 5% to
10% overhead for accurate timestamps would definitely be
noticed, so I guess we wouldn't enable that unconditionally,
but could do it as an opt-in mount option if someone had a
use case.

       Arnd

---
/* measure times for high-resolution clocksource access from userspace */
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <stdbool.h>
#include <sys/syscall.h>

static int do_clock_gettime(clockid_t clkid, struct timespec *tp, bool vdso)
{
        if (vdso)
                return clock_gettime(clkid, tp);

        return syscall(__NR_clock_gettime, clkid, tp);
}

static int loop1sec(int clkid, bool vdso)
{
        int i;
        struct timespec t, start;

        do_clock_gettime(clkid, &start, vdso);
        i = 0;
        do {
                do_clock_gettime(clkid, &t, vdso);
                i++;
        } while (t.tv_sec == start.tv_sec || t.tv_nsec < start.tv_nsec);

        return i;
}

int main(void)
{
        printf("vdso hires:     %7.2fns\n", 1000000000.0 /
loop1sec(CLOCK_REALTIME, true));
        printf("vdso coarse:    %7.2fns\n", 1000000000.0 /
loop1sec(CLOCK_REALTIME_COARSE, true));
        printf("sysc hires:     %7.2fns\n", 1000000000.0 /
loop1sec(CLOCK_REALTIME, false));
        printf("sysc coarse:    %7.2fns\n", 1000000000.0 /
loop1sec(CLOCK_REALTIME_COARSE, false));

        return 0;
}



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux