----- Original Message ----- > > > > -----Original Message----- > > ----- Original Message ----- > > > I faced an incomplete vmcore previous week which was generated because > > > system running in kdump kernel was somehow rebooted in the middle of > > > copying vmcore. > > > > > > Unfortunately, in the incomplete vmcore, most of the tasks failed to > > > be detected via PID hash table because the objects relevant to PID > > > hash including ptes needed to refer to the objects were lost. > > > > > > Although I successfully found many of objects of task_struct from > > > another data structure such as via a circular list of task_struct::tasks > > > and via run queue, crash sub-commands never work with the following > > > message if a given object is not contained in the task table: > > > > > > crash> bt 0xffffffff00000000 > > > bt: invalid task or pid value: 0xffffffff00000000 > > > > > > To address this issue, I made a patch to add a command-line option > > > to pass a list of addresses of task_struct objects to make crash > > > try to detect them in task table. > > > > > > I made this in very short time and there may be better interface > > > than command-line option. > > > > > > Tested on top of crash-7.2.5. > > > > Yeah, what bothers me about this patch is that even though it worked for > > your > > particular half-baked vmcore, it may never be of any help to anybody else > > in the future. > > > > It's similar in nature to patches that have posted that address a > > particular > > unique kernel bug that was seen in one vmcore, but it would be highly > > unlikely > > that the circumstances would ever be seen again. > > I'm posting this patch because I think this could be useful for everyone... > > It was unfortunate that incomplete vmcore was generated and sent to us but > there's case where engineers have to investigate issues based on the > incomplete vmcore. > > I also think there would other cases where restoring task_struct could fail > due to pure software issues, for example, memory corruption bugs, and think > it natural that crash doesn't behave as expected when kernel data structures > are abnormal state. > > I think options such as --minimal and --no_kmem_cache are to deal with > such cases and this feature is similar in this sense. Yes, but --minimal is for extreme situations, and doesn't really apply in this case given the small subset of available commands. --no_kmem_cache is more for situations where for whatever reason the slab subsystem initialization fails but it's not necessary to kill the whole session. There is "crash --active", which gathers just the active tasks on each cpu without using the normal task-gathering function. Were you aware of that option? > > By the way, I feel like I saw vmcores where some error messages > were output during "(gathering task table data)" in the past and > I guess some tasks were missing there but this was the first case I actually > needed to try to restore them. > > > > > But in this case, it's even more unlikely given that it's dealing with > > an incomplete vmcore. You were lucky that you were able to even > > bring up a crash session at all -- and then were able to generate > > a task list after that. > > It was incomplete but was complete about 98%. The detection from PID > hash was affected by loss of the remaining 2%. > > > > > Following the task_struct.tasks list doesn't gather all of the > > tasks in a task group, so it doesn't create a fully populated task > > list, correct? > > Yes, I needed to repeat iterating successfully detected task_struct > objects in order until all the target tasks were covered, and as your > guess, there's no guarantee that I found all the task_struct objects, > so I said 'many of'. Ok, I suppose it's useful but not necessary to gather all tasks. As I mentioned before, --active will gather the tasks running at the time of the crash, which are typically the most important. But I understand your point, where there may be other non-active tasks whose backtrace or whatever may be useful. > > > > > Plus it doesn't make sense to add it unless it's documented *how* to > > create the task list to begin with. > > How about writing use case in help message or/and manual page? > > -T file > Make crash detect task_struct objects listed in file as in > task table. This is useful when your interesting tasks are > missing in task table. You may find your interesting > task_struct objects from various kernel data structures: > > From task_struct::tasks: > > crash> list task_struct.tasks -s task_struct.pid,comm -h > ffff88003daa8000 > ffff88003daa8000 > pid = 1 > comm = "systemd\000\060\000\000\000\000\000\000" > ffff88003daa8fd0 > pid = 2 > comm = "kthreadd\000\000\000\000\000\000\000" > ffff88003daa9fa0 > pid = 3 > comm = "ksoftirqd/0\000\000\000\000" > ...<snip>... > > From runqueue: > > crash> runq > CPU 0 RUNQUEUE: ffff88003fc16cc0 > CURRENT: PID: 2188 TASK: ffff8800360f8000 COMMAND: "foobar.sh" > RT PRIO_ARRAY: ffff88003fc16e50 > [no tasks queued] > CFS RB_ROOT: ffff88003fc16d68 > [no tasks queued] > > CPU 1 RUNQUEUE: ffff88003fd16cc0 > CURRENT: PID: 1 TASK: ffff88003daa8000 COMMAND: "systemd" > RT PRIO_ARRAY: ffff88003fd16e50 > [no tasks queued] > CFS RB_ROOT: ffff88003fd16d68 > [120] PID: 19054 TASK: ffff88000b684f10 COMMAND: "kworker/1:0" > [120] PID: 3863 TASK: ffff88003bd02f70 COMMAND: "emacs" > > This might be lengthy under option section. Right, it could be more verbose under the help page section, but less so in the man page and in "crash --help". > > > > > I don't know, let me think about this... > > I don't think the current design is best. > > For example, it might be better to be able to update task table at runtime > by some crash sub-command. Looking at source code, it appears that > task table is updated at command execution when needed, so it's not > so difficult? > > void > exec_command(void) > { > ...<snip>... > if ((ct = get_command_table_entry(args[0]))) { > if (ct->flags & REFRESH_TASK_TABLE) { > if (XEN_HYPER_MODE()) { > #ifdef XEN_HYPERVISOR_ARCH > xen_hyper_refresh_domain_context_space(); > xen_hyper_refresh_vcpu_context_space(); > #else > error(FATAL, XEN_HYPERVISOR_NOT_SUPPORTED); > #endif > } else if (!(pc->flags & MINIMAL_MODE)) { > tt->refresh_task_table() > sort_context_array(); > sort_tgid_array(); > } > } Right, although each of the several task table gathering plugin functions only run one time if it's a dumpfile. But that could be adjusted for a run-time command option that wants to add one or more tasks to the table. Perhaps refresh_active_task_table() could be modified to allow the addition of extra tasks, either by command line option, or allowing the function to be re-run during runtime as directed by a "task -a task-address" or "task -A file" option. Thanks, Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility