----- Original Message ----- > > Modified patch attached. It is rebased to latest crash version. > The arguments are in the form of ordered pair as you had mentioned. I > have tested it with arm and armv8 ramdumps. > > Do we really need dump_ramdump_def ? As the dump is converted to > kdump and we use the kdump flag in pc->flags, help -D and help -n > works fine using kdump dump functions. Did I miss something ? > > I will send you the link to arm64 ramdump in another email. > > Thanks, > Vinayak I tested your latest patch on the sample ARM and ARM64 RAM dumps you sent me. As far as the patch itself is concerned, I ran into a problem where if crash is invoked in a directory where it does not have write permission, the session hangs trying to write to a bad file descriptor -- because of this: fd2 = open(out_elf, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR); if (!fd2) { error(INFO, "%s open error\n", out_elf); goto end1; } It should be "if (fd2 < 0)". But more to the point, in my earlier response, I had suggested this: > With respect to the [-o output_file], and given the potential > simplicity of the argument string, I think it should be > optional. You could do something like this in the getopt() > handler, and have the ELF output_file name pre-stored in the > ramdump_def structure: > > + case 'o': > + ramdump_elf_output_file(optarg); > + break; > > If "-o output_file" is NOT used, then ramdump_to_elf() can > pass back the name of a temporary file. I should have been more clear w/respect to "a temporary file". what I was suggesting was that you do something like using mkstemp(3) to create a temporay file in /var/tmp, and then unlink() it immediately so it would only exist until the crash session ends. I tested this patch on your sample ARM and ARM64 RAM dumps. The 32-bit ARM dumpfile can be analyzed OK, but as you noted, the ARM64 dump requires "--cpus 4" to come OK, which really should not be required. Investigating the reason for the "--cpus 4" requirement, it's helpful to compare the two dumps. With your sample 32-bit ARM dumpfile, although it comes up OK with 4 cpus, note that only cpu 1 is marked online in the kernel: crash> help -k ... cpu_possible_map: 0 1 2 3 cpu_present_map: 0 1 2 3 cpu_online_map: 1 cpu_active_map: 0 1 2 3 ... The 32-bit arm.c arm_get_smp_cpus() function calculates the number of cpus like this: return MAX(get_cpus_active(), get_cpus_online()); so it returns 4 since the "active" map shows all 4 cpus. The "ps" command shows tasks associated with all 4 cpus, and the runqueues look like this, where cpus 0 and 2 have their idle task running, and cpus 1 and 3 have user-mode tasks running: crash> runq CPU 0 RUNQUEUE: c0f286c0 CURRENT: PID: 0 TASK: c0a5d8b0 COMMAND: "swapper/0" RT PRIO_ARRAY: c0f287a0 [no tasks queued] CFS RB_ROOT: c0f28730 [no tasks queued] CPU 1 RUNQUEUE: c0f316c0 CURRENT: PID: 13429 TASK: db944580 COMMAND: "AudioIn_5F8" RT PRIO_ARRAY: c0f317a0 [no tasks queued] CFS RB_ROOT: c0f31730 [120] PID: 474 TASK: d9a36ac0 COMMAND: "kworker/1:1" [120] PID: 2890 TASK: c89b2580 COMMAND: "sh" CPU 2 RUNQUEUE: c0f3a6c0 CURRENT: PID: 0 TASK: db63a040 COMMAND: "swapper/2" RT PRIO_ARRAY: c0f3a7a0 [no tasks queued] CFS RB_ROOT: c0f3a730 [112] PID: 1599 TASK: db87d040 COMMAND: "mm_device_threa" CPU 3 RUNQUEUE: c0f436c0 CURRENT: PID: 1949 TASK: db951040 COMMAND: "WindowManager" RT PRIO_ARRAY: c0f437a0 [no tasks queued] CFS RB_ROOT: c0f43730 [no tasks queued] crash> So it does seem that whatever mechanism you use to take the raw RAM dump on the 32-bit ARM offlines cpus first? Now, on the ARM64 dumpfile, if I force it to come up with "--cpus 4" it shows that only cpu 0 is online, present and active: crash> help -k ... cpu_possible_map: 0 1 2 3 cpu_present_map: 0 cpu_online_map: 0 cpu_active_map: 0 ... I can understand that perhaps cpus are offlined prior to taking the RAM dump, but it's strange that the "present" and "active" maps are also the same as the "online" map? Currently the arm64.c arm64_get_smp_cpus() returns the number of cpus like this: return MAX(get_cpus_online(), get_highest_cpu_online()+1); so it returns 1. Even if it did it the same as the 32-bit ARM, it would still return 1 because of the active map. So we have to force it to return 4 with "--cpus 4". But having done that, oddly enough, the "runq" command shows this, where the "CURRENT" task on cpu 0 is "0": crash> runq CPU 0 RUNQUEUE: ffffffc03ffb6e40 CURRENT: 0 RT PRIO_ARRAY: ffffffc03ffb6fb0 [no tasks queued] CFS RB_ROOT: ffffffc03ffb6f10 [no tasks queued] CPU 1 RUNQUEUE: ffffffc03ffc1e40 CURRENT: PID: 0 TASK: ffffffc03ecb4b00 COMMAND: "swapper/1" RT PRIO_ARRAY: ffffffc03ffc1fb0 [no tasks queued] CFS RB_ROOT: ffffffc03ffc1f10 [no tasks queued] CPU 2 RUNQUEUE: ffffffc03ffcce40 CURRENT: PID: 0 TASK: ffffffc03ecb5dc0 COMMAND: "swapper/2" RT PRIO_ARRAY: ffffffc03ffccfb0 [no tasks queued] CFS RB_ROOT: ffffffc03ffccf10 [no tasks queued] CPU 3 RUNQUEUE: ffffffc03ffd7e40 CURRENT: PID: 0 TASK: ffffffc03ecf0000 COMMAND: "swapper/3" RT PRIO_ARRAY: ffffffc03ffd7fb0 [no tasks queued] CFS RB_ROOT: ffffffc03ffd7f10 [no tasks queued] crash> I have never seen this before -- As I understand it, if no other task is queued and run on a cpu, then it defaults to the idle/swapper task for that cpu, whose address is hard-wired in the per-cpu runqueue structure. But if I look at the rq structure for cpu 0, not only is the "curr" task pointer NULL, the "idle" task pointer is also: crash> rq.curr,idle,cpu ffffffc03ffb6e40 curr = 0x0 idle = 0x0 cpu = 0 crash> whereas the other 3 cpus show that they are running their idle tasks: crash> rq.curr,idle,cpu ffffffc03ffc1e40 curr = 0xffffffc03ecb4b00 idle = 0xffffffc03ecb4b00 cpu = 1 crash> rq.curr,idle,cpu ffffffc03ffcce40 curr = 0xffffffc03ecb5dc0 idle = 0xffffffc03ecb5dc0 cpu = 2 crash> rq.curr,idle,cpu ffffffc03ffd7e40 curr = 0xffffffc03ecf0000 idle = 0xffffffc03ecf0000 cpu = 3 crash> Perhaps it has something to do with *when* you took the dump. The "sys" command shows an UPTIME of 00:00:00: crash> sys KERNEL: /home/anderson/Downloads/tmp_ARM64/vmlinux DUMPFILE: ramdump_elf CPUS: 4 DATE: Wed Dec 31 19:00:00 1969 UPTIME: 00:00:00 LOAD AVERAGE: 0.00, 0.00, 0.00 TASKS: 34 NODENAME: (none) RELEASE: 3.10.33+ VERSION: #22 SMP PREEMPT Tue May 6 16:23:34 IST 2014 MACHINE: aarch64 (unknown Mhz) MEMORY: 1 GB PANIC: "" crash> And the "ps" command doesn't show any user-space tasks running, not even "init" PID 1, and the funky idle/swapper task on cpu 0 shows a PID of 1: crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM > 0 -1 1 ffffffc03ecb4b00 RU 0.0 0 0 [swapper/1] > 0 -1 2 ffffffc03ecb5dc0 RU 0.0 0 0 [swapper/2] > 0 -1 3 ffffffc03ecf0000 RU 0.0 0 0 [swapper/3] 1 -1 0 ffffffc03ec78000 UN 0.0 0 0 [swapper/0] 2 -1 0 ffffffc03ec792c0 IN 0.0 0 0 [kthreadd] 3 2 0 ffffffc03ec7a580 IN 0.0 0 0 [ksoftirqd/0] 4 2 0 ffffffc03ec7b840 IN 0.0 0 0 [kworker/0:0] 5 2 0 ffffffc03ec7cb00 IN 0.0 0 0 [kworker/0:0H] 6 2 0 ffffffc03ec7ddc0 IN 0.0 0 0 [kworker/u8:0] 7 2 0 ffffffc03ecb0000 IN 0.0 0 0 [migration/0] 8 2 0 ffffffc03ecb12c0 IN 0.0 0 0 [rcu_preempt] 9 2 0 ffffffc03ecb2580 IN 0.0 0 0 [rcu_bh] 10 2 0 ffffffc03ecb3840 IN 0.0 0 0 [rcu_sched] 11 2 1 ffffffc03ecf12c0 ?? 0.0 0 0 [migration/1] 12 2 1 ffffffc03ecf2580 ?? 0.0 0 0 [ksoftirqd/1] 13 2 1 ffffffc03ecf3840 IN 0.0 0 0 [kworker/1:0] 14 2 1 ffffffc03ecf4b00 IN 0.0 0 0 [kworker/1:0H] 15 2 2 ffffffc03ecf5dc0 ?? 0.0 0 0 [migration/2] 16 2 2 ffffffc03ed20000 IN 0.0 0 0 [ksoftirqd/2] 17 2 0 ffffffc03ed212c0 UN 0.0 0 0 [kworker/2:0] 18 2 2 ffffffc03ed22580 IN 0.0 0 0 [kworker/2:0H] 19 2 3 ffffffc03ed23840 IN 0.0 0 0 [migration/3] 20 2 3 ffffffc03ed24b00 IN 0.0 0 0 [ksoftirqd/3] 21 2 3 ffffffc03ed25dc0 IN 0.0 0 0 [kworker/3:0] 22 2 3 ffffffc03ed40000 IN 0.0 0 0 [kworker/3:0H] 23 2 0 ffffffc03ed412c0 IN 0.0 0 0 [khelper] 24 2 0 ffffffc03ed42580 IN 0.0 0 0 [kdevtmpfs] 25 2 0 ffffffc03ed43840 IN 0.0 0 0 [kworker/u8:1] 56 2 0 ffffffc03ededdc0 IN 0.0 0 0 [bcm_ipc_ch0] 57 2 0 ffffffc03edecb00 IN 0.0 0 0 [bcm_ipc_ch11] 180 2 0 ffffffc03ee5ddc0 IN 0.0 0 0 [writeback] 182 2 0 ffffffc03ee912c0 IN 0.0 0 0 [bioset] 184 2 0 ffffffc03ede92c0 IN 0.0 0 0 [kworker/u9:0] 185 2 0 ffffffc03ede8000 IN 0.0 0 0 [kblockd] crash> So I'm guessing that this dumpfile was taken before the "init" task was even created, and the kernel data structures were not fully initialized? Maybe you can try taking a RAM dump on an ARM64 machine after it is up and running? Thanks, Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility