Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()

Konstantin Khlebnikov <khlebnikov@xxxxxxxxxx> · Fri, 20 Apr 2012 23:23:27 +0400

Oleg Nesterov wrote:
On 04/13, Oleg Nesterov wrote:

Damn, Konstantin I have to admit, I'll try to find another technical
reason against mm-correctly-synchronize-rss-counters-at-exit-exec.patch
even with this fix ;)

Most probably I am wrong, but it looks overcomplicated. Somehow I
dislike irrationally the fact you moved mm_release() from exit_mm().

And perhaps you can help me to discredit your patch?

It turns out, I do not really understand this code in do_exit:

	/* sync mm's RSS info before statistics gathering */
	if (tsk->mm)
		sync_mm_rss(tsk->mm);

Which "statistics gathering" ? Probably I missed something, but
after the quick grep it seems to me that this is only needed for
taskstats_exit()->xacct_add_tsk().

So why we can't simply add sync_mm_rss() into xacct_add_tsk() ?

Yes, this way we do not "account" put_user(clear_child_tid) but
I think we do not care.

Why we don't care? Each thread can corrupt these counters by one.
I do not think that we are satisfied with nearly accurate rss accounting.
+/- one page for each clone()-exit().

Actually I don't really like this per-task rss-delta.
Probably it would be better to use per-cpu counters.


IOW, what do you think about the trivial patch below? Uncompiled,
untested, probably incomplete. acct_update_integrals() looks
suspicious too.

what a mess! =)


Oleg.

--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *sta
  	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
  	mm = get_task_mm(p);
  	if (mm) {
+		sync_mm_rss(mm);
  		/* adjust to KB unit */
  		stats->hiwater_rss   = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
  		stats->hiwater_vm    = get_mm_hiwater_vm(mm)  * PAGE_SIZE / KB;
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,6 +643,8 @@ static void exit_mm(struct task_struct *
  	mm_release(tsk, mm);
  	if (!mm)
  		return;
+
+	sync_mm_rss(mm);
  	/*
  	 * Serialize with any possible pending coredump.
  	 * We must hold mmap_sem around checking core_state
@@ -960,9 +962,6 @@ void do_exit(long code)
  				preempt_count());

  	acct_update_integrals(tsk);
-	/* sync mm's RSS info before statistics gathering */
-	if (tsk->mm)
-		sync_mm_rss(tsk->mm);
  	group_dead = atomic_dec_and_test(&tsk->signal->live);
  	if (group_dead) {
  		hrtimer_cancel(&tsk->signal->real_timer);
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *m
  	/* Notify parent that we're no longer interested in the old VM */
  	tsk = current;
  	old_mm = current->mm;
-	sync_mm_rss(old_mm);
  	mm_release(tsk, old_mm);

  	if (old_mm) {
+		sync_mm_rss(old_mm);
  		/*
  		 * Make sure that if there is a core dump in progress
  		 * for the old mm, we get out and die instead of going

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email:<a href=mailto:"dont@xxxxxxxxx";>  email@xxxxxxxxx</a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>