Dave Anderson wrote:
Daniel Li wrote:
Dave Anderson wrote:
Daniel Li wrote:
It seems the problem is not one with guest dump, but the version of
SLES.
After upgrading my NATIVE SLES 9 system to SP 3, exactly the same
problem happened while trying to use 'crash' on the live system,
with a debug linux kernel ('vmlinux.dbg' below) built on the same
system from matching 'kernel-source' package. (During this upgrade,
the linux kernel changed from 2.6.5-7.97-smp to 2.6.5-7.244-smp, the
same as that on the guest.)
Has anyone else seen this?
Did anything change in the task_struct between 2.6.5-7.97-smp and
2.6.5-7.244-smp?
Or, more likely, anything associated with the pidhash/pid_hash-related
code in the kernel?
Is the output of the crash command "help -t | grep refresh_task_table"
different when running against 2.6.5-7.97-smp vs. 2.6.5-7.244-smp?
Dave
The definition of task_struct between 2.6.5-7.97-smp and
2.6.5-7.244-smp did change. There is one new 8-bytes field called
'last_ran' before the list_head for tasks. This is what I don't get:
why should it matter as long as the dump and debug kernel are using
the same definition?
It shouldn't.
Does the output of "help -o task_struct" on the .97 vs the .244 kernels
reflect the member offset differences as you would expect? I.e.,
everything
(that's not -1) coming after the new last_ran member is bumped up by 8?
And are you sure there's nothing different w/respect to the pid_hash
declarations/usage?
Dave
>
>> The output of "help -t | grep refresh_task_table" didn't change.
The reason I ask about any pid_hash-related changes is because
over the years the manner of task table handling by the crash
utility has had to change to deal with the kernel changes.
The crash-internal tt->refresh_task_table function pointer
that you see in the "help -t" output gets set during task_init()
to one of these functions:
static void refresh_fixed_task_table(void);
static void refresh_unlimited_task_table(void);
static void refresh_pidhash_task_table(void);
static void refresh_pid_hash_task_table(void);
static void refresh_hlist_task_table(void);
static void refresh_hlist_task_table_v2(void);
with later kernels requiring the later function in the list above.
For a 2.6.5 vintage kernel, I'm guessing that when you did
the "help -t" it showed "refresh_pid_hash_task_table()"?
Anyway, in the two kernels that you are comparing, how is the
"pid_hash" variable declared in the kernel sources? With
respect to the crash-internal setting of tt->refresh_task_table,
it should line up like so:
kernel: static struct list_head pid_hash[PIDTYPE_MAX][PIDHASH_SIZE];
crash: refresh_pid_hash_task_table()
kernel: static struct hlist_head *pid_hash[PIDTYPE_MAX];
crash: refresh_hlist_task_table()
kernel: static struct hlist_head *pid_hash;
crash: refresh_hlist_task_table_v2()
For whatever reason it almost looks like the task-gathering is
using the wrong function, or maybe given back-ports and such,
the SUSE kernel task-handling is now a "hybrid" that would need
its own task-gathering function in the crash utility.
With respect to the "last_ran" addition, you could always rebuild
a kernel with that field moved to the end of the task_struct,
run that kernel, and see what happens. If the "ps" task output
is still screwed up, then it should rule that out as the problem
at hand.
Dave
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility