Hi, I've found a problem on crash 4.0-2. On x86_64 system, crash causes segmentation fault by executing "bt -f" for the dumpfile created by NMI. My System is as follows. CPU : AMD Opteron(tm) Processor 252 arch : x86_64 memory : 16GB kernel : 2.6.9-22.EL (RHEL4-U2) crash : 4.0-2 (RHEL4-U2) diskdumputils: 1.1.9-4 (RHEL4-U2) The reproduction step is as follows. 1.Boot x86_64 kernel. 2.Start diskdump service. 3.Execute diskdump by pushing the NMI button. 4.Reboot x86_64 kernel. 5.Get the dumpfile by starting diskdump service. 6.Activate crash and execute "bt -f" for the dumpfile. 7.Segmentation fault after printing exception stack. After printing the NMI exception frame, x86_64_low_budget_back_trace_cmd calculates the next bt->frameptr without changing RSP. This will cause the condition bt->frameptr > rsp in line x86_64.c:1097 at x86_64_display_full_frame, causing the following loop to run continuously until it stops with a segmentation fault. The attached patch adds the sanity check (bt->frameptr < rsp) in x86_64_display_full_frame. The following example describes this problem when NMI occurs within "default_idle". @x86_64_low_budget_back_trace_cmd (x86_64.c:1367) 1.about Exception Stack (x86_64.c:1416) a. Print Exception Stack. b. Print Register Info(RIP,RSP) from Exception Stack as function before NMI exception. The RIP points the text in "default_idle". But the area pointed by RSP keeps the address of the text in "cpu_idle", because RSP doesn't change while "default_idle" is running. c. bt->frameptr = RSP + sizeof(ulong). 2.about Process Stack (x86_64.c:1655) a. Try to print stack of "cpu_idle" in x86_64_display_full_frame. b. bt->frameptr > RSP because of Section 1.c. c. Cause segmentation fault. Ken'ichi Ohmichi
Attachment:
crash.patch
Description: Binary data