On Monday 20 April 2015 22:40:57 Thomas Gleixner wrote: > On Mon, 20 Apr 2015, Baolin Wang wrote: > > @@ -771,6 +771,7 @@ SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id, > > struct itimerspec __user *, setting) > > { > > struct itimerspec cur_setting; > > + struct itimerspec64 cur_setting64; > > struct k_itimer *timr; > > struct k_clock *kc; > > unsigned long flags; > > @@ -781,10 +782,16 @@ SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id, > > return -EINVAL; > > > > kc = clockid_to_kclock(timr->it_clock); > > - if (WARN_ON_ONCE(!kc || !kc->timer_get)) > > + if (WARN_ON_ONCE(!kc || (!kc->timer_get && !kc->timer_get64))) { > > ret = -EINVAL; > > - else > > - kc->timer_get(timr, &cur_setting); > > + } else { > > + if (kc->timer_get64) { > > + kc->timer_get64(timr, &cur_setting64); > > + cur_setting = itimerspec64_to_itimerspec(cur_setting64); > > + } else { > > + kc->timer_get(timr, &cur_setting); > > + } > > + } > > This is really horrible. You add a metric ton of conditionals to every > syscall just to remove them later again. I have not yet checked the > end result, but this approach is error prone as hell and just > introduces completely useless code churn. > > It's useless because you do not factor out the guts of the syscall > functions so we can reuse the very same logic for the future 2038 safe > syscalls which we need to introduce for 32bit machines. Hi Thomas, I should probably go ahead and introduce all the new syscalls in a separate patch series. I have my own ideas about how to get there, in a slightly different way from what you propose: > Take a look at the compat syscalls. They do the right thing. > > COMPAT_SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id, > struct compat_itimerspec __user *, setting) > { > long err; > mm_segment_t oldfs; > struct itimerspec ts; > > oldfs = get_fs(); > set_fs(KERNEL_DS); > err = sys_timer_gettime(timer_id, > (struct itimerspec __user *) &ts); > set_fs(oldfs); > if (!err && put_compat_itimerspec(setting, &ts)) > return -EFAULT; > return err; > } As a side note, I want to kill off the get_fs()/set_fs() calls in the process. These always make me dizzy when I try to work out whether there is a potential security hole (there is not in this case), and they can be slow on some architectures. > So we can be clever and do the following: > > 1) Preparatory work in posix-timer.c (Patch #1) > > 2) Do the 64bit infrastructure work in posix-timer.c (Patch #2) > > The result is two simple to review patches with minimal code churn. This all sounds great, good idea. > The nice thing is that once we introduce new syscalls for 32bit > machines, e.g. sys_timer_gettime64(), all we need to do is: > > /* Get the time remaining on a POSIX.1b interval timer. */ > SYSCALL_DEFINE2(timer_gettime64, timer_t, timer_id, > struct itimerspec64 __user *, setting) > { > struct itimerspec64 cur_setting64; > int ret = __timer_gettime(timer_id, &cur_setting64); > > if (!ret && copy_to_user(setting, &cur_setting, sizeof (cur_setting))) > return -EFAULT; > > return ret; > } > > And on 64bit timer_gettime64() and timer_gettime() are the same, so we > just need to do a clever mapping of timer_gettime() to > timer_gettime64(). Not rocket science.... > > For 32 bit we provide the old timer_gettime() for non converted > applications: > > #ifdef CONFIG_32BIT_OLD_TIMESPEC_SYSCALLS > /* Get the time remaining on a POSIX.1b interval timer in the old timespec format. */ > SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id, > struct itimerspec __user *, setting) > { > struct itimerspec64 cur_setting64; > struct itimerspec cur_setting; > int ret = __timer_gettime(timer_id, &cur_setting64); > > if (!ret) { > cur_setting = itimerspec64_to_itimerspec(&cur_setting64); > > if (copy_to_user(setting, &cur_setting, sizeof (cur_setting))) > return -EFAULT; > } > return ret; > } > #endif > > Simple, isn't it? No useless churn. Proper refactoring for the next > step. No useless copying for 64 bit. My preferred solution is one where we end up with the same syscalls for both 32-bit and 64-bit, and basically use the compat_sys_timer_gettime() implementation (or a simplified version) for the existing , something like this: diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl index ef8187f9d28d..71e74a47a2cf 100644 --- a/arch/x86/syscalls/syscall_32.tbl +++ b/arch/x86/syscalls/syscall_32.tbl @@ -267,7 +267,7 @@ 258 i386 set_tid_address sys_set_tid_address 259 i386 timer_create sys_timer_create compat_sys_timer_create 260 i386 timer_settime sys_timer_settime compat_sys_timer_settime -261 i386 timer_gettime sys_timer_gettime compat_sys_timer_gettime +261 i386 timer_gettime compat_sys_timer_gettime 262 i386 timer_getoverrun sys_timer_getoverrun 263 i386 timer_delete sys_timer_delete 264 i386 clock_settime sys_clock_settime compat_sys_clock_settime @@ -365,3 +365,4 @@ 356 i386 memfd_create sys_memfd_create 357 i386 bpf sys_bpf 358 i386 execveat sys_execveat stub32_execveat +359 i386 timer_gettime64 sys_timer_gettime diff --git a/kernel/compat.c b/kernel/compat.c index 24f00610c575..15d4f5abb492 100644 --- a/kernel/compat.c +++ b/kernel/compat.c @@ -723,18 +723,14 @@ COMPAT_SYSCALL_DEFINE4(timer_settime, timer_t, timer_id, int, flags, COMPAT_SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id, struct compat_itimerspec __user *, setting) { - long err; - mm_segment_t oldfs; - struct itimerspec ts; + struct itimerspec64 ts; + long ret; - oldfs = get_fs(); - set_fs(KERNEL_DS); - err = sys_timer_gettime(timer_id, - (struct itimerspec __user *) &ts); - set_fs(oldfs); - if (!err && put_compat_itimerspec(setting, &ts)) - return -EFAULT; - return err; + ret = __timer_gettime(timer_id, &ts); + if (ret) + return ret; + + return put_compat_itimerspec(setting, &ts); } COMPAT_SYSCALL_DEFINE2(clock_settime, clockid_t, which_clock, diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index 31ea01f42e1f..c601f1969f5a 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -766,32 +766,27 @@ common_timer_get(struct k_itimer *timr, struct itimerspec *cur_setting) cur_setting->it_value = ktime_to_timespec(remaining); } +static int put_compat_itimerspec(struct __kernel_itimerspec64 __user *dst, + const struct itimerspec64 *src) +{ + if (__put_timespec64(&src->it_interval, &dst->it_interval) || + __put_timespec64(&src->it_value, &dst->it_value)) + return -EFAULT; + return 0; +} + /* Get the time remaining on a POSIX.1b interval timer. */ SYSCALL_DEFINE2(timer_gettime, timer_t, timer_id, - struct itimerspec __user *, setting) + struct __kernel_itimerspec64 __user *, setting) { - struct itimerspec cur_setting; - struct k_itimer *timr; - struct k_clock *kc; - unsigned long flags; - int ret = 0; - - timr = lock_timer(timer_id, &flags); - if (!timr) - return -EINVAL; + struct itimerspec64 cur_setting; + int ret; - kc = clockid_to_kclock(timr->it_clock); - if (WARN_ON_ONCE(!kc || !kc->timer_get)) - ret = -EINVAL; - else - kc->timer_get(timr, &cur_setting); + ret = __timer_gettime(timer_id, &cur_setting); + if (ret) + return ret; - unlock_timer(timr, flags); - - if (!ret && copy_to_user(setting, &cur_setting, sizeof (cur_setting))) - return -EFAULT; - - return ret; + return put_itimerspec(setting, &cur_setting); } /* Note the use of a separate __kernel_itimerspec64 for the user interface here, which I think will be needed to hide the differences between the normal itimerspec on 64-bit machines, and the new itimerspec on 32-bit platforms that will be defined differently (using 'long long'). My plan for migration here is to temporarily #define __kernel_itimerspec64 as itimerspec on 32-bit architectures (which may be a bit confusing), and then introduce a new CONFIG_COMPAT_TIME option that is set by each architecture that has been converted over to use the new syscalls. This way we can do one syscall at a time at first, followed by one architecture at a time. Unless you have serious objections to that plan, I'd like to work on a first version of this myself and send that out for clarifications. I would also prefer not too many people to work on the syscalls, and would rather have Baolin not touch any of the syscall prototypes for the moment. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html