On Fri, Mar 26, 2021 at 8:31 AM Navin P <navinp0304@xxxxxxxxx> wrote: > > On Thu, Mar 25, 2021 at 8:51 PM <jim.cromie@xxxxxxxxx> wrote: > > > > > > > > On Thu, Mar 25, 2021 at 6:53 AM Navin P <navinp0304@xxxxxxxxx> wrote: > >> > >> Hi, > >> > >> As of 5.11 kernel (pid,pid_start_time) is not unique /monotonic even > >> though the underlying counters are . > >> I chose start_boottime because i wanted the counter to increase > >> during suspend as well. > >> > My patch doesn't create a new field, it uses the existing field and > prints it in /proc/[pid]/stat. Stat files are created > on first lookup. These are used by ps or top. > > + seq_put_decimal_ull(m, " ", task->start_boottime); After printing the task->start_boottime as 53rd field, i found multiple process containing same value due to a race. The race exists for p->start_time as well. Working on the above i found a race in the kernel/fork.c at p->start_boottime = ktime_get_boottime_ns(); i logged the output and compared them to find identical values for many process. I made a patch and this fixes the issue but i'm not happy with the change because it locks the tasklist_lock which is a big lock. What are the other ways that can be used to fix this problem ? I was thinking of changing start_boottime as struct start_boottime { u64 boottime, rw_lock }; Now all my access to boottime go through the rw_lock . Another rwlock has to be created for start_time as well. >From a048ac81b468ec94cb10ca41416cfe9641be3c5b Mon Sep 17 00:00:00 2001 From: Navin P <navinp0304@xxxxxxxxx> Date: Thu, 1 Apr 2021 20:04:24 +0530 Subject: [PATCH] Wrap task->start_boottime in write_lock_irq during creation and filling leader->task_boottime in fs/exec.c . Also wrap read_lock in /proc/fs/array.c for reading task->start_boottime to remove the race. Signed-off-by: Navin P <navinp0304@xxxxxxxxx> --- fs/exec.c | 2 ++ fs/proc/array.c | 8 ++++++-- kernel/fork.c | 6 +++++- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 18594f11c31f..6e29698bdd90 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1109,7 +1109,9 @@ static int de_thread(struct task_struct *tsk) * also take its birthdate (always earlier than our own). */ tsk->start_time = leader->start_time; + write_lock_irq(&tasklist_lock); tsk->start_boottime = leader->start_boottime; + write_unlock_irq(&tasklist_lock); BUG_ON(!same_thread_group(leader, tsk)); /* diff --git a/fs/proc/array.c b/fs/proc/array.c index 74389aaefa9c..dae7e973b9de 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -469,7 +469,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, int num_threads = 0; int permitted; struct mm_struct *mm; - unsigned long long start_time; + unsigned long long start_time,curr_boottime; unsigned long cmin_flt = 0, cmaj_flt = 0; unsigned long min_flt = 0, maj_flt = 0; u64 cutime, cstime, utime, stime; @@ -563,8 +563,12 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, nice = task_nice(task); /* apply timens offset for boottime and convert nsec -> ticks */ + + read_lock(&tasklist_lock); start_time = nsec_to_clock_t(timens_add_boottime_ns(task->start_boottime)); + curr_boottime = task->start_boottime; + read_unlock(&tasklist_lock); seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns)); seq_puts(m, " ("); @@ -645,7 +649,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, else seq_puts(m, " 0"); - seq_put_decimal_ull(m, " ", task->start_boottime); + seq_put_decimal_ull(m, " ", curr_boottime); seq_putc(m, '\n'); if (mm) mmput(mm); diff --git a/kernel/fork.c b/kernel/fork.c index 426cd0c51f9e..41728b9bb76d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2219,11 +2219,15 @@ static __latent_entropy struct task_struct *copy_process( * communication until we take the tasklist-lock. In particular, we do * not want user-space to be able to predict the process start-time by * stalling fork(2) after we recorded the start_time but before it is - * visible to the system. + * visible to the system. Quickly take write lock and fill in details + * preventing race. */ p->start_time = ktime_get_ns(); + write_lock_irq(&tasklist_lock); p->start_boottime = ktime_get_boottime_ns(); + write_unlock_irq(&tasklist_lock); + /* * Make it visible to the rest of the system, but dont wake it up yet. -- 2.25.1 _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies