Re: [RFC][PATCH]: crash aborts with cannot determine idle task

Chandru <chandru@xxxxxxxxxx> · Wed, 10 Jun 2009 13:56:32 +0530

Dave Anderson wrote:
Sorry -- that's not what I meant...  

What I want to avoid is screwing around with the prstatus notes bookkeeping
unless it is absolutely necessary, i.e., where there had been some cpus offlined
prior to the crash.  The original thread back in April 2008 mentioned something
to the effect that your test system only had cpus 12 and 13 online at the time of
the crash.  When that is the case, is kt->cpus equal to 14?  I.e., what
does the "sys" command show for "CPUS:"?

I had the vmcore file from that test system and ran crash with -d1.
The cpu maps shown are ...

cpu_possible_map: 0 1 2 3 4 5 6 7 8 9 10 11 12 13
cpu_present_map: 8 9 10 11 12 13                 
cpu_online_map: 12 13

The 'sys' command shows CPUS as '14' (with the patch applied)

<snip>

crash> sys
     KERNEL: ./vmlinux
   DUMPFILE: ./vmcore
       CPUS: 14
       DATE: Tue Mar 25 14:43:39 2008
     UPTIME: 08:01:57
LOAD AVERAGE: 12.73, 5.40, 3.72
      TASKS: 262

I had another vmcore collected on another system by offlining couple of
cpus through sysfs interface. The cpu maps on this machine with 'crash 
-d1' show...

cpu_possible_map: 0 1 2 3                                             
cpu_present_map: 0 1 2 3                                              
cpu_online_map: 2 3

and 'sys' shows as

<snip>
crash> sys
     KERNEL: ./vmlinux
   DUMPFILE: ./vmcore
       CPUS: 4
       DATE: Sat Jun  6 15:00:24 2009
     UPTIME: 15:56:30

I ask because this is the way I'd prefer to go:

void
map_cpus_to_prstatus(void)
{
        void **nt_ptr;
        int online, i, j, nrcpus;
        size_t size;

        if (!(online = get_cpus_online()) || (online == kt->cpus))
                return;

        if (CRASHDEBUG(1))
                error(INFO,
                    "cpus: %d online: %d NT_PRSTATUS notes: %d (remapping)\n",
                        kt->cpus, online, nd->num_prstatus_notes);

        size = NR_CPUS * sizeof(void *);

        nt_ptr = (void **)GETBUF(size);
        BCOPY(nd->nt_prstatus_percpu, nt_ptr, size);
        BZERO(nd->nt_prstatus_percpu, size);

        /*
         *  Re-populate the array with the notes mapping to online cpus
         */
        nrcpus = (kt->kernel_NR_CPUS ? kt->kernel_NR_CPUS : NR_CPUS);

        for (i = 0, j = 0; i < nrcpus; i++) {
                if (in_cpu_map(ONLINE, i))
                        nd->nt_prstatus_percpu[i] = nt_ptr[j++];
        }

        FREEBUF(nt_ptr);
}

And since kt->cpus may not be finally initialized until later than
kernel_init(), I moved the call to map_cpus_to_prstatus() to here
in task_init():

        if (ACTIVE()) {
                active_pid = REMOTE() ? pc->server_pid : pc->program_pid;
                set_context(NO_TASK, active_pid);
                tt->this_task = pid_to_task(active_pid);
        }
        else {
                if (KDUMP_DUMPFILE())
                        map_cpus_to_prstatus();
                please_wait("determining panic task");
                set_context(get_panic_context(), NO_PID);
                please_wait_done();
        }

Can you test the map_cpus_to_prstatus() function above, along with the
movement of the call to it from kernel_init() to task_init()?

Yes, I tested these changes and they work fine.

Thanks,
Chandru

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility