Re: [RFC 00/20] ns: Introduce Time Namespace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dmitry Safonov <dima@xxxxxxxxxx> writes:

> Discussions around time virtualization are there for a long time.
> The first attempt to implement time namespace was in 2006 by Jeff Dike.
> From that time, the topic appears on and off in various discussions.
>
> There are two main use cases for time namespaces:
> 1. change date and time inside a container;
> 2. adjust clocks for a container restored from a checkpoint.
>
> “It seems like this might be one of the last major obstacles keeping
> migration from being used in production systems, given that not all
> containers and connections can be migrated as long as a time dependency
> is capable of messing it up.” (by github.com/dav-ell)
>
> The kernel provides access to several clocks: CLOCK_REALTIME,
> CLOCK_MONOTONIC, CLOCK_BOOTTIME. Last two clocks are monotonous, but the
> start points for them are not defined and are different for each running
> system. When a container is migrated from one node to another, all
> clocks have to be restored into consistent states; in other words, they
> have to continue running from the same points where they have been
> dumped.
>
> The main idea behind this patch set is adding per-namespace offsets for
> system clocks. When a process in a non-root time namespace requests
> time of a clock, a namespace offset is added to the current value of
> this clock on a host and the sum is returned.
>
> All offsets are placed on a separate page, this allows up to map it as 
> part of vvar into user processes and use offsets from vdso calls.
>
> Now offsets are implemented for CLOCK_MONOTONIC and CLOCK_BOOTTIME
> clocks.
>
> Questions to discuss:
>
> * Clone flags exhaustion. Currently there is only one unused clone flag
> bit left, and it may be worth to use it to extend arguments of the clone
> system call.
>
> * Realtime clock implementation details:
>   Is having a simple offset enough?
>   What to do when date and time is changed on the host?
>   Is there a need to adjust vfs modification and creation times? 
>   Implementation for adjtime() syscall.

Overall I support this effort.  In my quick skim this code looked good.

My feeling is that we need to be able to support running ntpd and
support one namespace doing googles smoothing of leap seconds while
another namespace takes the leap second.

What I was imagining when I was last thinking about this was one
instance of struct timekeeper aka tk_core per time namespace.  That
structure already keeps offsets for all of the various clocks from
the kerne internal time sources.  What would be needed would be to
pass in an appropriate time namespace pointer.

I could be completely wrong as I have not take the time to completely
trace through the code.  Have you looked at pushing the time namespace
down as far as tk_core?

What I think would be the big advantage (besides ntp working) is that
the bulk of the code could be reused.  Allowing testing of the kernel's
time code by setting up a new time namespace.  So a person in production
could setup a time namespace with the time set ahead a little  bit and
be able to verify that the kernel handles the upcoming leap second
properly.



I don't know about the vfs.  I think the danger is being able to write
dates in the future or in the past.  It appears that utimes(2) and
utimesnat(2) already allow this except for status change.  So it is
possible we simply don't care.  I seem to remember that what nfs does
is take the time stamp from the host writing to the file.

I think the guide for filesystem timestamps should be to first ensure
we don't introduce security issues, and then do what distributed
filesystems do when dealing with hosts with different clocks.

Given those those two guidlines above I don't think there is a need to
change timestamsp the way the user namespace changes uid when displayed.



As for the hardware like the real time clock we definitely should not
let a root in a time namespace change it.  We might even be able to get
away with leaving the real time clock out of the time namespace.  If not
we need to be very careful how the real time clock is abstracted.  I
would start by leaving the real time clock hardware out of the time
namespace and see if there is any part of userspace that cares.

Eric

> Cc: Dmitry Safonov <0x7f454c46@xxxxxxxxx>
> Cc: Adrian Reber <adrian@xxxxxxxx>
> Cc: Andrei Vagin <avagin@xxxxxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Christian Brauner <christian.brauner@xxxxxxxxxx>
> Cc: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
> Cc: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> 
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Jeff Dike <jdike@xxxxxxxxxxx>
> Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
> Cc: Pavel Emelyanov <xemul@xxxxxxxxxxxxx>
> Cc: Shuah Khan <shuah@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
> Cc: criu@xxxxxxxxxx
> Cc: linux-api@xxxxxxxxxxxxxxx
> Cc: x86@xxxxxxxxxx
>
> Andrei Vagin (12):
>   ns: Introduce Time Namespace
>   timens: Add timens_offsets
>   timens: Introduce CLOCK_MONOTONIC offsets
>   timens: Introduce CLOCK_BOOTTIME offset
>   timerfd/timens: Take into account ns clock offsets
>   kernel: Take into account timens clock offsets in clock_nanosleep
>   x86/vdso/timens: Add offsets page in vvar
>   x86/vdso: Use set_normalized_timespec() to avoid 32 bit overflow
>   posix-timers/timens: Take into account clock offsets
>   selftest/timens: Add test for timerfd
>   selftest/timens: Add test for clock_nanosleep
>   timens/selftest: Add timer offsets test
>
> Dmitry Safonov (8):
>   timens: Shift /proc/uptime
>   x86/vdso: Restrict splitting vvar vma
>   x86/vdso: Purge timens page on setns()/unshare()/clone()
>   x86/vdso: Look for vvar vma to purge timens page
>   timens: Add align for timens_offsets
>   timens: Optimize zero-offsets
>   selftest: Add Time Namespace test for supported clocks
>   timens/selftest: Add procfs selftest
>
>  arch/Kconfig                                     |   5 +
>  arch/x86/Kconfig                                 |   1 +
>  arch/x86/entry/vdso/vclock_gettime.c             |  52 +++++
>  arch/x86/entry/vdso/vdso-layout.lds.S            |   9 +-
>  arch/x86/entry/vdso/vdso2c.c                     |   3 +
>  arch/x86/entry/vdso/vma.c                        |  67 +++++++
>  arch/x86/include/asm/vdso.h                      |   2 +
>  fs/proc/namespaces.c                             |   3 +
>  fs/proc/uptime.c                                 |   3 +
>  fs/timerfd.c                                     |  16 +-
>  include/linux/nsproxy.h                          |   1 +
>  include/linux/proc_ns.h                          |   1 +
>  include/linux/time_namespace.h                   |  72 +++++++
>  include/linux/timens_offsets.h                   |  25 +++
>  include/linux/user_namespace.h                   |   1 +
>  include/uapi/linux/sched.h                       |   1 +
>  init/Kconfig                                     |   8 +
>  kernel/Makefile                                  |   1 +
>  kernel/fork.c                                    |   3 +-
>  kernel/nsproxy.c                                 |  19 +-
>  kernel/time/hrtimer.c                            |   8 +
>  kernel/time/posix-timers.c                       |  89 ++++++++-
>  kernel/time/posix-timers.h                       |   2 +
>  kernel/time_namespace.c                          | 230 +++++++++++++++++++++++
>  tools/testing/selftests/timens/.gitignore        |   5 +
>  tools/testing/selftests/timens/Makefile          |   6 +
>  tools/testing/selftests/timens/clock_nanosleep.c |  98 ++++++++++
>  tools/testing/selftests/timens/config            |   1 +
>  tools/testing/selftests/timens/log.h             |  21 +++
>  tools/testing/selftests/timens/procfs.c          | 145 ++++++++++++++
>  tools/testing/selftests/timens/timens.c          | 196 +++++++++++++++++++
>  tools/testing/selftests/timens/timer.c           |  95 ++++++++++
>  tools/testing/selftests/timens/timerfd.c         |  96 ++++++++++
>  33 files changed, 1272 insertions(+), 13 deletions(-)
>  create mode 100644 include/linux/time_namespace.h
>  create mode 100644 include/linux/timens_offsets.h
>  create mode 100644 kernel/time_namespace.c
>  create mode 100644 tools/testing/selftests/timens/.gitignore
>  create mode 100644 tools/testing/selftests/timens/Makefile
>  create mode 100644 tools/testing/selftests/timens/clock_nanosleep.c
>  create mode 100644 tools/testing/selftests/timens/config
>  create mode 100644 tools/testing/selftests/timens/log.h
>  create mode 100644 tools/testing/selftests/timens/procfs.c
>  create mode 100644 tools/testing/selftests/timens/timens.c
>  create mode 100644 tools/testing/selftests/timens/timer.c
>  create mode 100644 tools/testing/selftests/timens/timerfd.c



[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux