> -----Original Message----- > From: crash-utility-bounces@xxxxxxxxxx > [mailto:crash-utility-bounces@xxxxxxxxxx] On Behalf Of Dave Anderson > Sent: Thursday, January 17, 2019 11:17 PM > To: Discussion list for crash utility usage, maintenance and development > <crash-utility@xxxxxxxxxx> > Subject: Re: [PATCH] Add -T option to configure task table via > a file > > > > ----- Original Message ----- > > > > > > > -----Original Message----- > > > ----- Original Message ----- > > > > I faced an incomplete vmcore previous week which was generated because > > > > system running in kdump kernel was somehow rebooted in the middle of > > > > copying vmcore. > > > > > > > > Unfortunately, in the incomplete vmcore, most of the tasks failed to > > > > be detected via PID hash table because the objects relevant to PID > > > > hash including ptes needed to refer to the objects were lost. > > > > > > > > Although I successfully found many of objects of task_struct from > > > > another data structure such as via a circular list of task_struct::tasks > > > > and via run queue, crash sub-commands never work with the following > > > > message if a given object is not contained in the task table: > > > > > > > > crash> bt 0xffffffff00000000 > > > > bt: invalid task or pid value: 0xffffffff00000000 > > > > > > > > To address this issue, I made a patch to add a command-line option > > > > to pass a list of addresses of task_struct objects to make crash > > > > try to detect them in task table. > > > > > > > > I made this in very short time and there may be better interface > > > > than command-line option. > > > > > > > > Tested on top of crash-7.2.5. > > > > > > Yeah, what bothers me about this patch is that even though it worked for > > > your > > > particular half-baked vmcore, it may never be of any help to anybody else > > > in the future. > > > > > > It's similar in nature to patches that have posted that address a > > > particular > > > unique kernel bug that was seen in one vmcore, but it would be highly > > > unlikely > > > that the circumstances would ever be seen again. > > > > I'm posting this patch because I think this could be useful for everyone... > > > > It was unfortunate that incomplete vmcore was generated and sent to us but > > there's case where engineers have to investigate issues based on the > > incomplete vmcore. > > > > I also think there would other cases where restoring task_struct could fail > > due to pure software issues, for example, memory corruption bugs, and think > > it natural that crash doesn't behave as expected when kernel data structures > > are abnormal state. > > > > I think options such as --minimal and --no_kmem_cache are to deal with > > such cases and this feature is similar in this sense. > > Yes, but --minimal is for extreme situations, and doesn't really apply in this > case > given the small subset of available commands. --no_kmem_cache is more for > situations where for whatever reason the slab subsystem initialization fails > but it's not necessary to kill the whole session. > > There is "crash --active", which gathers just the active tasks on each cpu > without using the normal task-gathering function. Were you aware of that > option? I didn't know the option. But in this time I needed to gather inactive tasks so the option would have been insufficient. > > > > > By the way, I feel like I saw vmcores where some error messages > > were output during "(gathering task table data)" in the past and > > I guess some tasks were missing there but this was the first case I actually > > needed to try to restore them. > > > > > > > > But in this case, it's even more unlikely given that it's dealing with > > > an incomplete vmcore. You were lucky that you were able to even > > > bring up a crash session at all -- and then were able to generate > > > a task list after that. > > > > It was incomplete but was complete about 98%. The detection from PID > > hash was affected by loss of the remaining 2%. > > > > > > > > Following the task_struct.tasks list doesn't gather all of the > > > tasks in a task group, so it doesn't create a fully populated task > > > list, correct? > > > > Yes, I needed to repeat iterating successfully detected task_struct > > objects in order until all the target tasks were covered, and as your > > guess, there's no guarantee that I found all the task_struct objects, > > so I said 'many of'. > > Ok, I suppose it's useful but not necessary to gather all tasks. As I mentioned > before, --active will gather the tasks running at the time of the crash, which > are typically the most important. > > But I understand your point, where there may be other non-active tasks whose > backtrace or whatever may be useful. Yes, actually, I found in this time an inactive task that entered sleep due to some kernel bug, and the inactive task was initially not gathered. > > > > > > > > > Plus it doesn't make sense to add it unless it's documented *how* to > > > create the task list to begin with. > > > > How about writing use case in help message or/and manual page? > > > > -T file > > Make crash detect task_struct objects listed in file as in > > task table. This is useful when your interesting tasks are > > missing in task table. You may find your interesting > > task_struct objects from various kernel data structures: > > > > From task_struct::tasks: > > > > crash> list task_struct.tasks -s task_struct.pid,comm -h > > ffff88003daa8000 > > ffff88003daa8000 > > pid = 1 > > comm = "systemd\000\060\000\000\000\000\000\000" > > ffff88003daa8fd0 > > pid = 2 > > comm = "kthreadd\000\000\000\000\000\000\000" > > ffff88003daa9fa0 > > pid = 3 > > comm = "ksoftirqd/0\000\000\000\000" > > ...<snip>... > > > > From runqueue: > > > > crash> runq > > CPU 0 RUNQUEUE: ffff88003fc16cc0 > > CURRENT: PID: 2188 TASK: ffff8800360f8000 COMMAND: > "foobar.sh" > > RT PRIO_ARRAY: ffff88003fc16e50 > > [no tasks queued] > > CFS RB_ROOT: ffff88003fc16d68 > > [no tasks queued] > > > > CPU 1 RUNQUEUE: ffff88003fd16cc0 > > CURRENT: PID: 1 TASK: ffff88003daa8000 COMMAND: "systemd" > > RT PRIO_ARRAY: ffff88003fd16e50 > > [no tasks queued] > > CFS RB_ROOT: ffff88003fd16d68 > > [120] PID: 19054 TASK: ffff88000b684f10 COMMAND: > "kworker/1:0" > > [120] PID: 3863 TASK: ffff88003bd02f70 COMMAND: "emacs" > > > > This might be lengthy under option section. > > Right, it could be more verbose under the help page section, but less so in > the > man page and in "crash --help". > I see. > > > > > > > > I don't know, let me think about this... > > > > I don't think the current design is best. > > > > For example, it might be better to be able to update task table at runtime > > by some crash sub-command. Looking at source code, it appears that > > task table is updated at command execution when needed, so it's not > > so difficult? > > > > void > > exec_command(void) > > { > > ...<snip>... > > if ((ct = get_command_table_entry(args[0]))) { > > if (ct->flags & REFRESH_TASK_TABLE) { > > if (XEN_HYPER_MODE()) { > > #ifdef XEN_HYPERVISOR_ARCH > > > xen_hyper_refresh_domain_context_space(); > > xen_hyper_refresh_vcpu_context_space(); > > #else > > error(FATAL, > XEN_HYPERVISOR_NOT_SUPPORTED); > > #endif > > } else if (!(pc->flags & MINIMAL_MODE)) { > > tt->refresh_task_table() > > sort_context_array(); > > sort_tgid_array(); > > } > > } > > > Right, although each of the several task table gathering plugin functions > only run one time if it's a dumpfile. But that could be adjusted for a > run-time command option that wants to add one or more tasks to the table. > > Perhaps refresh_active_task_table() could be modified to allow the addition > of extra tasks, either by command line option, or allowing the function to be > re-run during runtime as directed by a "task -a task-address" or "task -A file" > option. Thanks for the idea. I'll try to make a patch based on the idea. By the way, I guess that would be next week or later in the current schedule. Thanks. HATAYAMA, Dais -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility