Re: Linux do_coredump() and SMP systems

Sudharsan Vijayaraghavan <sudvijayr@xxxxxxxxx> · Thu, 19 Feb 2015 17:30:26 +0530

Hi Greg,

There is plan to move to 3.14, right now the focus it to iron out
existing issues.
Now with regard to core dump issue, we find 10% of times we get struck in

coredump_wait():
   ==> wait_for_completion(&core_state->startup);
Analyzing exit_mm() to see what is going wrong here.

I have one other question, which am curious about,
In coredump_wait():

There is loop to wait for task inactive (no task is running on any core)
                ptr = core_state->dumper.next;
                while (ptr != NULL) {

                        pr_err("pid: %d %s() calling
wait_task_inactive() for pid : %d\n",
                               tsk->pid,__func__,ptr->task->pid);

                        wait_task_inactive(ptr->task, 0);

                        pr_err("pid: %d %s() wait_task_inactive()
returned for pid : %d\n",
                               tsk->pid,__func__,ptr->task->pid);

                        ptr = ptr->next;
                }

There is a delay between the crash  and actual generation of core dump
due to the above loop.
In a multicore system it is quite possible other threads of the same
process can run in other cores, as a consequence
the address space / program counter etc., can change

Given this coredump generated will not reflect the state of process
(various thread registers/mm) as it must have been at time of crash
(any thread/main process)
Is my understanding correct? Just probing on way to get rid of this discrepancy

Thanks,
Sudharsan

Thanks,
Sudharsan

On Wed, Feb 18, 2015 at 9:31 PM, Greg KH <greg@xxxxxxxxx> wrote:
> On Wed, Feb 18, 2015 at 11:44:32AM +0530, Sudharsan Vijayaraghavan wrote:
>> We are doing prototype so much change have gone into kernel , we are
>> finding it difficult to upgrade to latest immediately
>
> What changes are you making to the kernel that you are sticking with
> such an old version (3.8 is 2 years old now, and over 155 thousand
> changes have happened to the kernel since then)?

>> However I ran through the code once again, indeed kernel handles it
>> down_write(&mm->mmap_sem); in coredump_wait() makes sure the second
>> coredump is stopped and returns negative for core_waiters
>
> Great, so it works now?
>
> confused,
>
> greg k-h

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@xxxxxxxxxxxxxxxxx
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies