Hi, Hatayama-san I have a mostly same purpose extension with your patch. But your patch is great! , because supporting latest kernel and also dump filter masking. my current extention file is attached. Yes, my code is quite buggy, ugly and not enough against latest kernel than yours. (sigh ... I didnot know fill_vma_cache(), so do "vm -p" everytime before dump.) BTW, I have some comments. I'd like to add some features below to yours. or if you will do, it is happy for me. :) - support i386 - support elf32 binary on x86-64 - support old kernel (before 2.6.17) as Dave said, if your patch committed as extension, I could submit some patches to that. How about this? Best regards, Seigo Iguchi From: HATAYAMA Daisuke <d.hatayama@xxxxxxxxxxxxxx> Subject: [RFC] gcore subcommand: a process coredump feature Date: Mon, 02 Aug 2010 18:00:02 +0900 (東京 (標準時)) > Hello, > > For some weeks I've developed gcore subcommand for crash utility which > provides process coredump feature for crash kernel dump, strongly > demanded by users who want to investigate user-space applications > contained in kernel crash dump. > > I've now finished making a prototype version of gcore and found out > what are the issues to be addressed intensely. Could you give me any > comments and suggestions on this work? > > > Motivation > ========== > > It's a relatively familiar technique that in a cluster system a > currently running node triggers crash kernel dump mechanism when > detecting a kind of a critical error in order for the running, error > detecting server to cease as soon as possible. Concequently, the > residual crash kernel dump contains a process image for the erroneous > user application. At the case, developpers are interested in user > space, rather than kernel space. > > There's also a merit of gcore that it allows us to use several > userland debugging tools, such as GDB and binutils, in order to > analyze user space memory. > > > Current Status > ============== > > I confirm the prototype version runs on the following configuration: > > Linux Kernel Version: 2.6.34 > Supporting Architecture: x86_64 > Crash Version: 5.0.5 > Dump Format: ELF > > I'm planning to widen a range of support as follows: > > Linux Kernel Version: Any > Supporting Architecture: i386, x86_64 and IA64 > Dump Format: Any > > > Issues > ====== > > Currently, I have issues below. > > 1) Retrieval of appropriate register values > > The prototype version retrieves register values from a _wrong_ > location: a top of the kernel stack, into which register values are > saved at any preemption context switch. On the other hand, the > register values that should be included here are the ones saved at > user-to-kernel context switch on any interrupt event. > > I've yet to implement this. Specifically, I need to do the following > task from now. > > (1) list all entries from user-space to kernel-space execution path. > > (2) divide the entries according to where and how the register > values from user-space context are saved. > > (3) compose a program that retrieves the saved register values from > appropriate locations that is traced by means of (1) and (2). > > Ideally, I think it's best if crash library provides any means of > retrieving this kind of register values, that is, ones saved on > various stack frames. Is there such a plan to do? > > > 2) Getting a signal number for a task which was during core dump > process at kernel crash > > If a target task is halfway of core dump process, it's better to know > a signal number in order to know why the task was about to be core > dumped. > > Unfortunately, I have no choice but backtrace the kernel stack to > retrieve a signal number saved there as an argument of, for example, > do_coredump(). > > > 3) Kernel version compatibility > > crash's policy is to support all kernel versions by the latest crash > package. On the other hand, the prototype is based on kernel 2.6.34. > This means more kernel versions need to be supported. > > Well, the question is: to what versions do I need to really test in > addition to the latest upstream kernel? I think it's practically > enough to support RHEL4, RHEL5 and RHEL6. > > > Build Instruction > ================= > > $ tar xf crash-5.0.5.tar.gz > $ cd crash-5.0.5/ > $ patch -p 1 < gcore.patch > $ make > > > Usage > ===== > > Use help subcommand of crash utility as ``help gcore''. > > > Attached File > ============= > > * gcore.patch > > A patch implementing gcore subcommand for crash-5.0.5. > > The diffstat output is as follows. > > $ diffstat gcore.patch > Makefile | 10 +- > defs.h | 15 + > gcore.c | 1858 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > gcore.h | 639 ++++++++++++++++++++ > global_data.c | 3 + > help.c | 28 + > netdump.c | 27 + > tools.c | 37 ++ > 8 files changed, 2615 insertions(+), 2 deletions(-) > > -- > HATAYAMA Daisuke > d.hatayama@xxxxxxxxxxxxxx
/* * elfdump extension module for crash - make elf coredump image from vmcore * * Copyright (C) 2010 NEC Communication Systems, Inc. All rights reserved. * Author: Seigo Iguchi <iguchi.sg@xxxxxxxxxxxxxx> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2, or (at your option) * any later version. */ #define _GNU_SOURCE #include "defs.h" #include <elf.h> static int debug = 0; #define debug(fmt, args...) \ if (debug) \ fprintf(fp, "%s:%d:" fmt, __FUNCTION__, __LINE__, ## args) static int dump_shmem = 0; #include <asm/ptrace.h> // for pt_regs #if defined(X86) #include <asm/user.h> // for user_regs_struct #elif defined(X86_64) #include <asm/user.h> #include <asm/types.h> #endif #ifdef ELF_CORE_COPY_XFPREGS struct user32_fxsr_struct { unsigned short cwd; unsigned short swd; unsigned short twd; /* not compatible to 64bit twd */ unsigned short fop; int fip; int fcs; int foo; int fos; int mxcsr; int reserved; int st_space[32]; /* 8*16 bytes for each FP-reg = 128 bytes */ int xmm_space[32]; /* 8*16 bytes for each XMM-reg = 128 bytes */ int padding[56]; }; #endif // from list.h struct list_head { struct list_head *next, *prev; }; #define LIST_HEAD_INIT(name) { &(name), &(name) } #define LIST_HEAD(name) \ struct list_head name = LIST_HEAD_INIT(name) #define INIT_LIST_HEAD(ptr) do { \ (ptr)->next = (ptr); (ptr)->prev = (ptr); \ } while (0) static inline void __list_add(struct list_head *new, struct list_head *prev, struct list_head *next) { next->prev = new; new->next = next; new->prev = prev; prev->next = new; } static inline void list_add(struct list_head *new, struct list_head *head) { __list_add(new, head, head->next); } #define list_for_each(pos, head) \ for (pos = (head)->next; pos != (head); \ pos = pos->next) #define container_of(ptr, type, member) ({ \ const typeof( ((type *)0)->member ) *__mptr = (ptr); \ (type *)( (char *)__mptr - offsetof(type,member) );}) #define list_entry(ptr, type, member) \ container_of(ptr, type, member) struct hlist_head { struct hlist_node *first; }; struct hlist_node { struct hlist_node *next, **pprev; }; // from <asm-x86_64/bitops.h> static __inline__ unsigned long ffz(unsigned long word) { asm("bsf %1,%0" : "=r" (word) : "r" (~word)); return word; } #if defined(X86) #define ELF_CLASS ELFCLASS32 #define ELF_DATA ELFDATA2LSB #define ELF_ARCH EM_386 #elif defined(X86_64) #define ELF_CLASS ELFCLASS64 #define ELF_DATA ELFDATA2LSB #define ELF_ARCH EM_X86_64 #endif // from <asm/elf.h> typedef unsigned long elf_greg_t; #define ELF_NGREG (sizeof (struct user_regs_struct) / sizeof(elf_greg_t)) typedef elf_greg_t elf_gregset_t[ELF_NGREG]; #define ELF_EXEC_PAGESIZE 4096 #include <asm/types.h> typedef signed char s8; typedef unsigned char u8; typedef signed short s16; typedef unsigned short u16; typedef signed int s32; typedef unsigned int u32; typedef signed long long s64; typedef unsigned long long u64; #if ELF_CLASS == ELFCLASS32 #define elfhdr Elf32_Ehdr #define elf_phdr Elf32_Phdr #define elf_note Elf32_Nhdr #else #define elfhdr Elf64_Ehdr #define elf_phdr Elf64_Phdr #define elf_note Elf64_Nhdr #endif // from linux/elf.h /* Notes used in ET_CORE */ #define NT_PRFPREG 2 #define NT_PRXFPREG 0x46e62b7f /* copied from gdb5.1/include/elf/common.h */ typedef struct user_i387_struct elf_fpregset_t; #ifdef ELF_CORE_COPY_XFPREGS typedef struct user32_fxsr_struct elf_fpxregset_t; #endif #ifdef ELF_CORE_COPY_XFPREGS typedef elf_fpxregset_t fpxregset_t; #endif struct elf_siginfo { int si_signo; /* signal number */ int si_code; /* extra code */ int si_errno; /* errno */ }; struct elf_prstatus { #if 0 long pr_flags; /* XXX Process flags */ short pr_why; /* XXX Reason for process halt */ short pr_what; /* XXX More detailed reason */ #endif struct elf_siginfo pr_info; /* Info associated with signal */ short pr_cursig; /* Current signal */ unsigned long pr_sigpend; /* Set of pending signals */ unsigned long pr_sighold; /* Set of held signals */ pid_t pr_pid; pid_t pr_ppid; pid_t pr_pgrp; pid_t pr_sid; struct timeval pr_utime; /* User time */ struct timeval pr_stime; /* System time */ struct timeval pr_cutime; /* Cumulative user time */ struct timeval pr_cstime; /* Cumulative system time */ elf_gregset_t pr_reg; /* GP registers */ int pr_fpvalid; /* True if math co-processor being used. */ }; typedef unsigned int __kernel_uid_t; typedef unsigned int __kernel_gid_t; #define ELF_PRARGSZ (80) /* Number of chars for args */ struct elf_prpsinfo { char pr_state; /* numeric process state */ char pr_sname; /* char for pr_state */ char pr_zomb; /* zombie */ char pr_nice; /* nice val */ unsigned long pr_flag; /* flags */ __kernel_uid_t pr_uid; __kernel_gid_t pr_gid; pid_t pr_pid, pr_ppid, pr_pgrp, pr_sid; /* Lots missing */ char pr_fname[16]; /* filename of executable */ char pr_psargs[ELF_PRARGSZ]; /* initial part of arg list */ }; // from <linux/pid.h> enum pid_type { PIDTYPE_PID, PIDTYPE_TGID, PIDTYPE_PGID, PIDTYPE_SID, PIDTYPE_MAX }; struct pid { /* Try to keep pid_chain in the same cacheline as nr for find_pid */ int nr; struct hlist_node pid_chain; /* list of pids with the same nr, only one of them is in the hash */ struct list_head pid_list; }; struct dummy_pids{ struct pid pids[PIDTYPE_MAX]; }; // from kernel/exit.c struct task_context *next_thread(const struct task_context *p) { char *tp = fill_task_struct(p->task); if (THIS_KERNEL_VERSION >= LINUX(2,6,17)){ ulong next; long task_struct_thread_group = MEMBER_OFFSET("task_struct", "thread_group"); if (tp == NULL ){ error(INFO, "cannot get fill_task_struct. p=%p pid=%d\n", p, p->pid); return NULL; } next = (ulong) VOID_PTR(tp + task_struct_thread_group); return task_to_context(next - task_struct_thread_group); }else{ struct pid * pids; long offset; ulong * next_tp = NULL; pids = (struct pid *) (tp + OFFSET(task_struct_pids)); offset = OFFSET(task_struct_pids) + offsetof(struct dummy_pids, pids[PIDTYPE_TGID].pid_list); next_tp = (ulong*)((long)pids[PIDTYPE_TGID].pid_list.next - offset); error(INFO, "next tp=%p pids_list.next=%p offset=%d\n", next_tp , pids[PIDTYPE_TGID].pid_list.next, offset); return task_to_context((ulong)next_tp); } } static int maydump(ulong vma) { // from mm.h #define VM_READ 0x00000001 /* currently active flags */ #define VM_WRITE 0x00000002 #define VM_EXEC 0x00000004 #define VM_SHARED 0x00000008 #define VM_MAYREAD 0x00000010 /* limits for mprotect() etc */ #define VM_MAYWRITE 0x00000020 #define VM_MAYEXEC 0x00000040 #define VM_MAYSHARE 0x00000080 #define VM_GROWSDOWN 0x00000100 /* general info on the segment */ #define VM_GROWSUP 0x00000200 #define VM_SHM 0x00000400 /* shared memory area, don't swap out */ #define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */ #define VM_EXECUTABLE 0x00001000 #define VM_LOCKED 0x00002000 #define VM_IO 0x00004000 /* Memory mapped I/O or similar */ #define VM_RESERVED 0x00080000 /* Don't unmap it from swap_out */ ulong vm_flags; ulong anon_vma; char* vma_buf = fill_vma_cache(vma); vm_flags = SIZE(vm_area_struct_vm_flags) == sizeof(short) ? USHORT(vma_buf + OFFSET(vm_area_struct_vm_flags)) : ULONG(vma_buf + OFFSET(vm_area_struct_vm_flags)); anon_vma = (ulong)VOID_PTR((long)vma_buf + MEMBER_OFFSET("vm_area_struct", "anon_vma")); /* Do not dump I/O mapped devices, shared memory, or special mappings */ if (vm_flags & (VM_IO | VM_RESERVED)) return 0; if (vm_flags & (VM_SHARED) && dump_shmem){ return 1; } /* If it hasn't been written to, don't write it out */ if (!anon_vma) return 0; return 1; } /* An ELF note in memory */ struct memelfnote { const char *name; int type; unsigned int datasz; void *data; }; // from task.c static functions ... change static to global symbol in future - 2008/01/08 #undef _NSIG #define _NSIG 64 #define _NSIG_BPW machdep->bits #define _NSIG_WORDS (_NSIG / _NSIG_BPW) static uint64_t task_blocked(ulong task) { uint64_t sigset = 0; ulong offset; fill_task_struct(task); if (!tt->last_task_read) return 0; offset = OFFSET(task_struct_blocked); switch (_NSIG_WORDS) { case 1: sigset = (ulonglong) ULONG(tt->task_struct + offset); break; case 2: sigset = ((ulonglong) ULONG(tt->task_struct + offset)) << 32; sigset |= ((ulonglong) ULONG(tt->task_struct + offset + 4)); break; } return sigset; } # define do_div(n,base) ({ \ uint32_t __base = (base); \ uint32_t __rem; \ __rem = ((uint64_t)(n)) % __base; \ (n) = ((uint64_t)(n)) / __base; \ __rem; \ }) #ifndef div_long_long_rem #define div_long_long_rem(dividend,divisor,remainder) \ ({ \ unsigned long long result = dividend; \ *remainder = do_div(result,divisor); \ result; \ }) #endif static __inline__ void jiffies_to_timeval(const unsigned long jiffies, struct timeval *value) { /* * Convert jiffies to nanoseconds and separate with * one divide. */ /* HZ is the requested value. ACTHZ is actual HZ ("<< 8" is for accuracy) */ #define PIT_TICK_RATE 1193182UL #define CLOCK_TICK_RATE PIT_TICK_RATE /* Underlying HZ */ #define LATCH ((CLOCK_TICK_RATE + HZ/2) / HZ) /* For divider */ #define ACTHZ (SH_DIV (CLOCK_TICK_RATE, LATCH, 8)) #define SH_DIV(NOM,DEN,LSH) ( ((NOM / DEN) << LSH) \ + (((NOM % DEN) << LSH) + DEN / 2) / DEN) #define TICK_NSEC (SH_DIV (1000000UL * 1000, ACTHZ, 8)) #ifndef NSEC_PER_SEC #define NSEC_PER_SEC (1000000000L) #endif #ifndef NSEC_PER_USEC #define NSEC_PER_USEC (1000L) #endif u64 nsec = (u64)jiffies * TICK_NSEC; value->tv_sec = div_long_long_rem(nsec, NSEC_PER_SEC, &value->tv_usec); value->tv_usec /= NSEC_PER_USEC; } static char * signal_buf = NULL; /* * fill up all the fields in prstatus from the given task struct, except registers * which need to be filled up separately. */ static void fill_prstatus(struct elf_prstatus *prstatus, struct task_context *tc, long signr) { ulong signal_struct = 0, kaddr, handler, sigqueue = 0, next; ulong sighand_struct = 0; long size; ulong flags; //char *signal_buf, *uaddr; char *uaddr; uint64_t sigset, blocked; uint ti_flags; ulong utime, signal_utime, signal_cutime; ulong stime, signal_stime, signal_cstime; prstatus->pr_info.si_signo = prstatus->pr_cursig = signr; if (VALID_MEMBER(task_struct_sigpending)) prstatus->pr_sigpend = INT(tt->task_struct + OFFSET(task_struct_sigpending)); else if (VALID_MEMBER(thread_info_flags)) { fill_thread_info(tc->thread_info); ti_flags = UINT(tt->thread_info + OFFSET(thread_info_flags)); prstatus->pr_sigpend = ti_flags & (1<<TIF_SIGPENDING); } blocked = task_blocked(tc->task); prstatus->pr_sighold = blocked; prstatus->pr_pid = tc->pid; prstatus->pr_ppid = task_to_pid(tc->ptask); //prstatus->pr_pgrp = process_group(p); if (VALID_MEMBER(task_struct_sig)) signal_struct = ULONG(tt->task_struct + OFFSET(task_struct_sig)); else if (VALID_MEMBER(task_struct_signal)) signal_struct = ULONG(tt->task_struct + OFFSET(task_struct_signal)); size = MAX(SIZE(signal_struct), VALID_SIZE(signal_queue) ? SIZE(signal_queue) : SIZE(sigqueue)); if (VALID_SIZE(sighand_struct)) size = MAX(size, SIZE(sighand_struct)); signal_buf = malloc(size); readmem(signal_struct, KVADDR, signal_buf, SIZE(signal_struct), "signal_struct buffer", FAULT_ON_ERROR); prstatus->pr_sid = INT(signal_buf + MEMBER_OFFSET("signal_struct", "session")); prstatus->pr_pgrp = INT(signal_buf + MEMBER_OFFSET("signal_struct", "pgrp")); utime = ULONG(tt->task_struct + OFFSET(task_struct_utime)); stime = ULONG(tt->task_struct + OFFSET(task_struct_stime)); //if (thread_group_leader(p)) { if (tc->pid == INT(tt->task_struct + OFFSET(task_struct_tgid))) { signal_utime = ULONG(signal_buf + MEMBER_OFFSET("signal_struct", "utime")); signal_stime = ULONG(signal_buf + MEMBER_OFFSET("signal_struct", "stime")); jiffies_to_timeval(utime + signal_utime, &prstatus->pr_utime); jiffies_to_timeval(stime + signal_stime, &prstatus->pr_stime); } else { jiffies_to_timeval(utime, &prstatus->pr_utime); jiffies_to_timeval(stime, &prstatus->pr_stime); } signal_cutime = ULONG(signal_buf + MEMBER_OFFSET("signal_struct", "cutime")); signal_cstime = ULONG(signal_buf + MEMBER_OFFSET("signal_struct", "cstime")); jiffies_to_timeval(signal_cutime, &prstatus->pr_cutime); jiffies_to_timeval(signal_cstime, &prstatus->pr_cstime); } #ifndef ELF_OSABI #define ELF_OSABI ELFOSABI_NONE #endif #ifndef ELF_CORE_EFLAGS #define ELF_CORE_EFLAGS 0 #endif static inline void fill_elf_header(elfhdr *elf, int segs) { memcpy(elf->e_ident, ELFMAG, SELFMAG); elf->e_ident[EI_CLASS] = ELF_CLASS; elf->e_ident[EI_DATA] = ELF_DATA; elf->e_ident[EI_VERSION] = EV_CURRENT; elf->e_ident[EI_OSABI] = ELF_OSABI; memset(elf->e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD); elf->e_type = ET_CORE; elf->e_machine = ELF_ARCH; elf->e_version = EV_CURRENT; elf->e_entry = 0; elf->e_phoff = sizeof(elfhdr); elf->e_shoff = 0; elf->e_flags = ELF_CORE_EFLAGS; elf->e_ehsize = sizeof(elfhdr); elf->e_phentsize = sizeof(elf_phdr); elf->e_phnum = segs; elf->e_shentsize = 0; elf->e_shnum = 0; elf->e_shstrndx = 0; return; } static void fill_note(struct memelfnote *note, const char *name, int type, unsigned int sz, void *data) { note->name = name; note->type = type; note->datasz = sz; note->data = data; return; } static inline void fill_elf_note_phdr(elf_phdr *phdr, int sz, off_t offset) { phdr->p_type = PT_NOTE; phdr->p_offset = offset; phdr->p_vaddr = 0; phdr->p_paddr = 0; phdr->p_filesz = sz; phdr->p_memsz = 0; phdr->p_flags = 0; phdr->p_align = 0; return; } static int notesize(struct memelfnote *en) { int sz; sz = sizeof(elf_note); sz += roundup(strlen(en->name) + 1, 4); sz += roundup(en->datasz, 4); return sz; } static void fill_psinfo(struct elf_prpsinfo *psinfo, ulong *p, ulong *mm, const struct elf_prstatus* prstatus) { unsigned int i, len; char * tp; struct task_context* tc = task_to_context((ulong)p); ulong arg_start = ULONG(tt->mm_struct + MEMBER_OFFSET("mm_struct", "arg_start"));; physaddr_t paddr; int static_prio; if (!tc){ error(FATAL, "cannnot get tc p=%p\n", p); return; } /* first copy the parameters from user space */ memset(psinfo, 0, sizeof(struct elf_prpsinfo)); len = ULONG(tt->mm_struct + MEMBER_OFFSET("mm_struct", "arg_end")) - arg_start; if (len >= ELF_PRARGSZ) len = ELF_PRARGSZ-1; if ( ! uvtop(tc, arg_start, &paddr, 0) ){ // see. do_vtop error(FATAL, "not mapped arg_start=%p\n", arg_start); } if (FALSE == readmem(paddr, PHYSADDR, &psinfo->pr_psargs, len, "pr_psargs", RETURN_ON_ERROR) ) return ; for(i = 0; i < len; i++) if (psinfo->pr_psargs[i] == 0) psinfo->pr_psargs[i] = ' '; psinfo->pr_psargs[len] = 0; psinfo->pr_pid = prstatus->pr_pid; psinfo->pr_ppid = prstatus->pr_ppid; psinfo->pr_pgrp = prstatus->pr_pgrp; psinfo->pr_sid = prstatus->pr_sid; i = task_state(tc->task) ? ffz(~task_state(tc->task)) + 1 : 0; psinfo->pr_state = i; psinfo->pr_sname = (i < 0 || i > 5) ? '.' : "RSDTZW"[i]; psinfo->pr_zomb = psinfo->pr_sname == 'Z'; tp = fill_task_struct(tc->task); #define MAX_USER_RT_PRIO 100 #define MAX_RT_PRIO MAX_USER_RT_PRIO #define PRIO_TO_NICE(prio) ((prio) - MAX_RT_PRIO - 20) #define TASK_NICE(static_prio) PRIO_TO_NICE(static_prio) static_prio = INT(tp + MEMBER_OFFSET("task_struct", "static_prio")); psinfo->pr_nice = TASK_NICE(static_prio); psinfo->pr_flag = task_flags(tc->task); //SET_UID(psinfo->pr_uid, p->uid); //SET_GID(psinfo->pr_gid, p->gid); #if defined(X86) psinfo->pr_uid = USHORT(tp + MEMBER_OFFSET("task_struct", "uid")); psinfo->pr_gid = USHORT(tp + MEMBER_OFFSET("task_struct", "gid")); #elif defined(X86_64) psinfo->pr_uid = UINT(tp + MEMBER_OFFSET("task_struct", "uid")); psinfo->pr_gid = UINT(tp + MEMBER_OFFSET("task_struct", "gid")); #endif strncpy(psinfo->pr_fname, (char *)(tp + OFFSET(task_struct_comm)), sizeof(psinfo->pr_fname)); debug("fill_psinfo:pr_fname:%s\n", psinfo->pr_fname); return; } #if defined(X86) #define ELF_CORE_COPY_REGS(pr_reg, regs) do { \ (pr_reg)[0] = regs->ebx; \ (pr_reg)[1] = regs->ecx; \ (pr_reg)[2] = regs->edx; \ (pr_reg)[3] = regs->esi; \ (pr_reg)[4] = regs->edi; \ (pr_reg)[5] = regs->ebp; \ (pr_reg)[6] = regs->eax; \ (pr_reg)[7] = regs->xds; \ (pr_reg)[8] = regs->xes; \ (pr_reg)[11] = regs->orig_eax; \ (pr_reg)[12] = regs->eip; \ debug("regs %p -> pr_reg %p \n", regs, &pr_reg); \ debug("eip=%p -> %p (%p %p)\n", regs->eip , pr_reg[12], ®s->eip , &pr_reg[12]); \ (pr_reg)[13] = regs->xcs; \ (pr_reg)[14] = regs->eflags; \ (pr_reg)[15] = regs->esp; \ (pr_reg)[16] = regs->xss; \ }while(0); // TODO: \ savesegment(fs,pr_reg[9]); \ savesegment(gs,pr_reg[10]); \ #elif defined(X86_64) #define ELF_CORE_COPY_REGS(pr_reg, regs) do { \ unsigned v; \ (pr_reg)[0] = (regs)->r15; \ (pr_reg)[1] = (regs)->r14; \ (pr_reg)[2] = (regs)->r13; \ (pr_reg)[3] = (regs)->r12; \ (pr_reg)[4] = (regs)->rbp; \ (pr_reg)[5] = (regs)->rbx; \ (pr_reg)[6] = (regs)->r11; \ (pr_reg)[7] = (regs)->r10; \ (pr_reg)[8] = (regs)->r9; \ (pr_reg)[9] = (regs)->r8; \ (pr_reg)[10] = (regs)->rax; \ (pr_reg)[11] = (regs)->rcx; \ (pr_reg)[12] = (regs)->rdx; \ (pr_reg)[13] = (regs)->rsi; \ (pr_reg)[14] = (regs)->rdi; \ (pr_reg)[15] = (regs)->orig_rax; \ (pr_reg)[16] = (regs)->rip; \ (pr_reg)[17] = (regs)->cs; \ (pr_reg)[18] = (regs)->eflags; \ (pr_reg)[19] = (regs)->rsp; \ (pr_reg)[20] = (regs)->ss; \ } while(0); // TODO:: // (pr_reg)[21] = current->thread.fs; \ (pr_reg)[22] = current->thread.gs; \ asm("movl %%ds,%0" : "=r" (v)); (pr_reg)[23] = v; \ asm("movl %%es,%0" : "=r" (v)); (pr_reg)[24] = v; \ asm("movl %%fs,%0" : "=r" (v)); (pr_reg)[25] = v; \ asm("movl %%gs,%0" : "=r" (v)); (pr_reg)[26] = v; \ #endif static inline void elf_core_copy_regs(elf_gregset_t *elfregs, struct pt_regs *regs) { ELF_CORE_COPY_REGS((*elfregs), regs) } struct i387_fxsave_struct { u16 cwd; u16 swd; u16 twd; u16 fop; u64 rip; u64 rdp; u32 mxcsr; u32 mxcsr_mask; u32 st_space[32]; /* 8*16 bytes for each FP-reg = 128 bytes */ u32 xmm_space[64]; /* 16*16 bytes for each XMM-reg = 128 bytes */ u32 padding[24]; }; union i387_union { struct i387_fxsave_struct fxsave; }; struct thread_struct { unsigned long rsp0; unsigned long rsp; unsigned long userrsp; /* Copy from PDA */ unsigned long fs; unsigned long gs; unsigned short es, ds, fsindex, gsindex; /* Hardware debugging registers */ unsigned long debugreg0; unsigned long debugreg1; unsigned long debugreg2; unsigned long debugreg3; unsigned long debugreg6; unsigned long debugreg7; /* fault info */ unsigned long cr2, trap_no, error_code; /* floating point info */ union i387_union i387 __attribute__((aligned(16))); /* IO permissions. the bitmap could be moved into the GDT, that would make switch faster for a limited number of ioperm using tasks. -AK */ int ioperm; unsigned long *io_bitmap_ptr; unsigned io_bitmap_max; /* cached TLS descriptors. */ #define GDT_ENTRY_TLS_ENTRIES 3 u64 tls_array[GDT_ENTRY_TLS_ENTRIES]; }; /* * Capture the user space registers if the task is not running (in user space) */ #if defined(X86) int dump_task_regs(struct task_context *tsk, elf_gregset_t *regs) { ulong *ptr; struct pt_regs *pp, ptregs; char * tp, *ppbuf; physaddr_t paddr; // ptregs = *(struct pt_regs *) // ((unsigned long)tsk->thread_info+THREAD_SIZE - sizeof(ptregs)); tp = fill_task_struct(tsk->task); ptr = ULONG_PTR(tp + OFFSET(task_struct_thread_info) ) ; pp = (struct pt_regs*) ((ulong)&ptr[STACKSIZE()/sizeof(ulong)] -8); --pp; //uvtop(tsk, (ulong)pp, &paddr, 0); //readmem(paddr, PHYSADDR, &ptregs, readmem((int)pp, KVADDR, &ptregs, sizeof(ptregs), "ptregs buf", FAULT_ON_ERROR); ptregs.xcs &= 0xffff; ptregs.xds &= 0xffff; ptregs.xes &= 0xffff; ptregs.xss &= 0xffff; debug("ptregs.rip=%p rsp=%p\n", ptregs.eip, ptregs.esp); elf_core_copy_regs(regs, &ptregs); return 1; } #elif defined(X86_64) int dump_task_regs(struct task_context *tsk, elf_gregset_t *regs) { struct pt_regs *pp, ptregs; char * tp, *ppbuf; physaddr_t paddr; // pp = (struct pt_regs *)(tsk->thread.rsp0); tp = fill_task_struct(tsk->task); pp = (struct pt_regs*) VOID_PTR(tp + OFFSET(task_struct_thread)) ; --pp; readmem((int)pp, KVADDR, &ptregs, sizeof(ptregs), "ptregs buf", FAULT_ON_ERROR); ptregs.cs &= 0xffff; ptregs.ss &= 0xffff; #if 1 // TODO:: need ? rsp change from rsp to userrsp pp = VOID_PTR(tp + OFFSET(task_struct_thread) + 2 * sizeof(ulong) ); ptregs.rsp = (ulong) pp; #endif debug("ptregs.rip=%p rsp=%p\n", ptregs.rip, ptregs.rsp); elf_core_copy_regs(regs, &ptregs); return 1; } #endif #define ELF_CORE_COPY_TASK_REGS(tsk, elf_regs) dump_task_regs(tsk, elf_regs) static inline int elf_core_copy_task_regs(struct task_context *t, elf_gregset_t* elfregs) { #ifdef ELF_CORE_COPY_TASK_REGS return ELF_CORE_COPY_TASK_REGS(t, elfregs); #endif return 0; } #define PF_USED_MATH 0x00002000 /* if unset the fpu must be initialized before use */ int dump_task_fpu(struct task_context *tsk, struct user_i387_struct *fpu) { int fpvalid ; char * tp, *ppbuf; struct thread_struct *thread; tp = fill_task_struct(tsk->task); if ( MEMBER_EXISTS("task_struct", "used_math") ) fpvalid = SHORT(tp + MEMBER_OFFSET("task_struct", "used_math")); else fpvalid = task_flags(tsk->task) && PF_USED_MATH; thread = (struct thread_struct*) (tp + OFFSET(task_struct_thread)) ; if (fpvalid) { memcpy(fpu, &thread->i387.fxsave, sizeof(struct user_i387_struct)); } return fpvalid; } #define ELF_CORE_COPY_FPREGS(tsk, elf_fpregs) dump_task_fpu(tsk, elf_fpregs) static inline int elf_core_copy_task_fpregs(struct task_context *t, struct pt_regs *regs, elf_fpregset_t *fpu) { #ifdef ELF_CORE_COPY_FPREGS return ELF_CORE_COPY_FPREGS(t, fpu); #else return dump_fpu(regs, fpu); #endif } /* Here is the structure in which status of each thread is captured. */ struct elf_thread_status { struct list_head list; struct elf_prstatus prstatus; /* NT_PRSTATUS */ elf_fpregset_t fpu; /* NT_PRFPREG */ struct task_context *thread; #ifdef ELF_CORE_COPY_XFPREGS elf_fpxregset_t xfpu; /* NT_PRXFPREG */ #endif struct memelfnote notes[3]; int num_notes; }; /* * In order to add the specific thread information for the elf file format, * we need to keep a linked list of every threads pr_status and then * create a single section for them in the final core file. */ static int elf_dump_thread_status(long signr, struct elf_thread_status *t) { int sz = 0; struct task_context *p = t->thread; t->num_notes = 0; fill_prstatus(&t->prstatus, p, signr); elf_core_copy_task_regs(p, &t->prstatus.pr_reg); fill_note(&t->notes[0], "CORE", NT_PRSTATUS, sizeof(t->prstatus), &(t->prstatus)) ; t->num_notes++; sz += notesize(&t->notes[0]); if ((t->prstatus.pr_fpvalid = elf_core_copy_task_fpregs(p, NULL, &t->fpu))) { fill_note(&t->notes[1], "CORE", NT_PRFPREG, sizeof(t->fpu), &(t->fpu)); t->num_notes++; sz += notesize(&t->notes[1]); } #ifdef ELF_CORE_COPY_XFPREGS if (elf_core_copy_task_xfpregs(p, &t->xfpu)) { fill_note(&t->notes[2], "LINUX", NT_PRXFPREG, sizeof(t->xfpu), &t->xfpu); t->num_notes++; sz += notesize(&t->notes[2]); } #endif return sz; } static int writenote(struct memelfnote *men, int fd) { static int count=1; elf_note en; struct stat stat; en.n_namesz = strlen(men->name) + 1; en.n_descsz = men->datasz; en.n_type = men->type; /* XXX - cast from long long to long to avoid need for libgcc.a */ write(fd, &en, sizeof(en)); write(fd, men->name, en.n_namesz); fstat(fd , &stat); lseek(fd, roundup((unsigned long)stat.st_size, 4), SEEK_SET); /* XXX */ write(fd, men->data, men->datasz); fstat(fd , &stat); lseek(fd, roundup((unsigned long)stat.st_size, 4), SEEK_SET); /* XXX */ fstat(fd , &stat); return 1; } void cmd_elfdump() { #define NUM_NOTES 6 pid_t pid; int segs; size_t size = 0; int i; elfhdr *elf = NULL; off_t offset = 0, dataoff; unsigned long limit ; int numnote; struct memelfnote *notes = NULL; struct elf_prstatus *prstatus = NULL; /* NT_PRSTATUS */ struct elf_prpsinfo *psinfo = NULL; /* NT_PRPSINFO */ struct task_context *g , *p; LIST_HEAD(thread_list); struct list_head *t; elf_fpregset_t *fpu = NULL; #ifdef ELF_CORE_COPY_XFPREGS elf_fpxregset_t *xfpu = NULL; #endif struct task_context *tc = NULL; ulong value; ulong mm_struct; char* mm_struct_buf; char* task_struct_buf; int thread_status_size = 0; #ifndef elf_addr_t #define elf_addr_t unsigned long #endif elf_addr_t *auxv; #define MAX_FILENAME 256 char filename[MAX_FILENAME]; int fd; int c; debug=0; while ((c = getopt(argcnt, args, "ds")) != EOF) { switch(c){ case 'd': debug=1; break; case 's': dump_shmem=1; break; default: argerrs++; break; } } if (argerrs) cmd_usage(pc->curcmd, SYNOPSIS); elf = malloc(sizeof(*elf)); if (!elf) return; prstatus = malloc(sizeof(*prstatus)); if (!prstatus) goto cleanup; psinfo = malloc(sizeof(*psinfo)); if (!psinfo) goto cleanup; notes = malloc(NUM_NOTES * sizeof(struct memelfnote)); if (!notes) goto cleanup; memset(notes, 0, NUM_NOTES* sizeof(struct memelfnote)); fpu = malloc(sizeof(*fpu)); if (!fpu) goto cleanup; #ifdef ELF_CORE_COPY_XFPREGS xfpu = malloc(sizeof(*xfpu)); if (!xfpu) goto cleanup; #endif // get task_struct addr. str_to_context(args[optind], &value, &tc); pid = value; debug("args[1] = %s pid=%d\n", args[1], pid); tc = pid_to_context(value); if (!tc){ error(WARNING, "tc==NULL\n"); goto cleanup; } debug("tc->task=%p\n", tc->task); if (1) { long signr = 0; struct elf_thread_status *tmp; struct task_mem_usage task_mem_usage, *tm; int already_in_list=0; i =0; for (g = p = FIRST_CONTEXT() ; i< RUNNING_TASKS(); i++,g++,p++){ if (p == NULL ){ error(INFO,"p==NULL\n"); break; } if (p->pid == 0 ) continue; // skip init_task do { get_task_mem_usage(p->task, &task_mem_usage); if (tc->mm_struct == task_mem_usage.mm_struct_addr && tc->task != p->task) { list_for_each(t , &thread_list){ struct elf_thread_status *th; th = list_entry(t, struct elf_thread_status, list); if (th->thread == p) already_in_list=1; } if (already_in_list){ debug("thread is already_in_list! tc->task=%p <-> p->task=%p\n", tc->task, p->task); already_in_list =0; break; } debug("found thread! tc->task=%p <-> p->task=%p\n", tc->task, p->task); tmp = malloc(sizeof(*tmp)); if (!tmp) { goto cleanup; } memset(tmp, 0, sizeof(*tmp)); INIT_LIST_HEAD(&tmp->list); tmp->thread = p; list_add(&tmp->list, &thread_list); } }while ((p = next_thread(p)) != g); } // end of for list_for_each(t, &thread_list) { struct elf_thread_status *tmp; int sz; tmp = list_entry(t, struct elf_thread_status, list); sz = elf_dump_thread_status(signr, tmp); thread_status_size += sz; } } // read again tc = pid_to_context(value); fill_task_struct(tc->task); memset(prstatus, 0, sizeof(*prstatus)); fill_prstatus(prstatus, tc, 0); //dump_task_regs(tc, &prstatus->pr_reg); ELF_CORE_COPY_TASK_REGS(tc, &prstatus->pr_reg); debug("tc->mm_struct=%p sizeof(mm_struct)=%d\n", tc->mm_struct, SIZE(mm_struct)); mm_struct_buf = GETBUF(SIZE(mm_struct)); readmem(tc->mm_struct, KVADDR, mm_struct_buf, SIZE(mm_struct), "mm_struct buffer", FAULT_ON_ERROR); segs = INT(mm_struct_buf + MEMBER_OFFSET("mm_struct", "map_count")); /* Set up header */ fill_elf_header(elf, segs+1); /* including notes section */ fill_note(notes +0, "CORE", NT_PRSTATUS, sizeof(*prstatus), prstatus); fill_mm_struct(tc->mm_struct); fill_psinfo(psinfo, (ulong*)ULONG((long)tt->task_struct + MEMBER_OFFSET("task_struct", "group_leader")), (ulong*)tc->mm_struct, prstatus); fill_note(notes +1, "CORE", NT_PRPSINFO, sizeof(*psinfo), psinfo); task_struct_buf = GETBUF(SIZE(task_struct)); readmem(tc->task, KVADDR, task_struct_buf, SIZE(task_struct), "task_struct buffer", FAULT_ON_ERROR); fill_note(notes +2, "CORE", NT_TASKSTRUCT, SIZE(task_struct) , (void*)task_struct_buf); numnote = 3; auxv = (elf_addr_t *) (mm_struct_buf + MEMBER_OFFSET("mm_struct", "saved_auxv")); i = 0; do{ debug("auxv[%d]=%lx auxv[%d]=%lx\n", i, auxv[i], i+1, auxv[i+1]); i += 2; }while (auxv[i - 2] != AT_NULL); fill_note(¬es[numnote++], "CORE", NT_AUXV, i * sizeof (elf_addr_t), auxv); debug("auxv=%p offset(saved_auxv)=%d i=%d numnote=%d\n", auxv, MEMBER_OFFSET("mm_struct", "saved_auxv") , i , numnote); /* Try to dump the FPU. */ prstatus->pr_fpvalid = elf_core_copy_task_fpregs(tc, NULL, fpu); if (prstatus->pr_fpvalid ) fill_note(notes + numnote++, "CORE", NT_PRFPREG, sizeof(*fpu), fpu); #ifdef ELF_CORE_COPY_XFPREGS if (elf_core_copy_task_xfpregs(current, xfpu)) fill_note(notes + numnote++, "LINUX", NT_PRXFPREG, sizeof(*xfpu), xfpu); #endif memset(filename, 0 , sizeof(filename)); sprintf(filename, "elfcore.%d", pid); fd = open(filename, O_CREAT|O_RDWR, 0644); if (fd == -1){ error(WARNING, "cannot open corefile (core.%d) ...\n", pid); goto cleanup; } write(fd, elf, sizeof(*elf)); offset += sizeof(*elf); /* Elf header */ offset += (segs+1) * sizeof(elf_phdr); /* Program headers */ /* Write notes phdr entry */ { elf_phdr phdr; int sz = 0; for (i = 0; i < numnote; i++){ debug( "notesize[%d]=%d\n", i, notesize(notes+i)); sz += notesize(notes + i); } sz += thread_status_size; debug( "fill_elf_note_phdr:sz=%d offset=%d\n", sz, offset); fill_elf_note_phdr(&phdr, sz, offset); offset += sz; write(fd, &phdr, sizeof(phdr)); } /* Page-align dumped data */ dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE); ulong vma; ulong vm_start; ulong vm_end; ulong vm_next; ulong vm_flags; char *vma_buf; /* Write program headers for segments dump */ vma = ULONG(tt->mm_struct + OFFSET(mm_struct_mmap)); debug( "vma=%p\n", vma); for (; (void*)vma != NULL; vma = vm_next) { elf_phdr phdr; size_t sz; // from linux/mm.h #define VM_READ 0x00000001 /* currently active flags */ #define VM_WRITE 0x00000002 #define VM_EXEC 0x00000004 vma_buf = fill_vma_cache(vma); vm_start = ULONG(vma_buf + OFFSET(vm_area_struct_vm_start)); vm_end = ULONG(vma_buf + OFFSET(vm_area_struct_vm_end)); vm_flags = SIZE(vm_area_struct_vm_flags) == sizeof(short) ? USHORT(vma_buf + OFFSET(vm_area_struct_vm_flags)) : ULONG(vma_buf + OFFSET(vm_area_struct_vm_flags)); sz = vm_end - vm_start; phdr.p_type = PT_LOAD; phdr.p_offset = offset; phdr.p_vaddr = vm_start; phdr.p_paddr = 0; phdr.p_filesz = maydump(vma) ? sz : 0; phdr.p_memsz = sz; offset += phdr.p_filesz; phdr.p_flags = vm_flags & VM_READ ? PF_R : 0; if (vm_flags & VM_WRITE) phdr.p_flags |= PF_W; if (vm_flags & VM_EXEC) phdr.p_flags |= PF_X; phdr.p_align = ELF_EXEC_PAGESIZE; write(fd, &phdr, sizeof(phdr)); vm_next = (ulong)VOID_PTR(vma_buf + OFFSET(vm_area_struct_vm_next)); } #ifdef ELF_CORE_WRITE_EXTRA_PHDRS ELF_CORE_WRITE_EXTRA_PHDRS; #endif /* write out the notes section */ for (i = 0; i < numnote; i++) if (!writenote(notes + i, fd)) goto end_coredump; /* write out the thread status notes section */ list_for_each(t, &thread_list) { struct elf_thread_status *tmp = list_entry(t, struct elf_thread_status, list) ; for (i = 0; i < tmp->num_notes; i++) if (!writenote(&tmp->notes[i], fd)) goto end_coredump; } debug( "dataoff=%d\n", dataoff); lseek(fd, dataoff, SEEK_SET); size = 0; if (!signal_buf){ error(INFO, "signal_buf is null\n"); goto cleanup; }else{ struct rlimit *rlim; rlim = (struct rlimit*)signal_buf + MEMBER_OFFSET("signal_struct", "rlim"); limit = rlim[RLIMIT_CORE].rlim_cur; } for (vma = ULONG(tt->mm_struct + OFFSET(mm_struct_mmap)); vma ; vma = vm_next) { unsigned long addr; vma_buf = fill_vma_cache(vma); if (!maydump(vma)){ vm_next = (ulong)VOID_PTR(vma_buf + OFFSET(vm_area_struct_vm_next)); continue; } vm_start = ULONG(vma_buf + OFFSET(vm_area_struct_vm_start)); vm_end = ULONG(vma_buf + OFFSET(vm_area_struct_vm_end)); vm_flags = SIZE(vm_area_struct_vm_flags) == sizeof(short) ? USHORT(vma_buf + OFFSET(vm_area_struct_vm_flags)) : ULONG(vma_buf + OFFSET(vm_area_struct_vm_flags)); debug("vma=%p vm_start=%p vm_end=%p\n", vma, vm_start, vm_end); for (addr = vm_start; addr < vm_end; addr += PAGE_SIZE) { physaddr_t paddr; if (!uvtop(tc, addr, &paddr, 0)) { debug("\taddr=%p skip(the page is not mapped)\n", addr); lseek(fd, PAGE_SIZE, SEEK_CUR); } else { void *pagebuf = malloc(PAGE_SIZE); debug("\taddr=%p paddr=%p\n", addr, paddr); if (FALSE == readmem(paddr, PHYSADDR, pagebuf, PAGE_SIZE, "read page", RETURN_ON_ERROR) ){ error(WARNING, "addr=%p paddr=%p skip(readmem false)\n", addr, paddr); lseek(fd, PAGE_SIZE, SEEK_CUR); free(pagebuf); }else{ if ((size += PAGE_SIZE) > limit || -1 == write(fd, pagebuf, PAGE_SIZE)) { free(pagebuf); goto end_coredump; } free(pagebuf); } } } vm_next = (ulong)VOID_PTR(vma_buf + OFFSET(vm_area_struct_vm_next)) ; } #ifdef ELF_CORE_WRITE_EXTRA_DATA ELF_CORE_WRITE_EXTRA_DATA; #endif #if 0 if ((off_t) file->f_pos != offset) { /* Sanity check */ printk("elf_core_dump: file->f_pos (%ld) != offset (%ld)\n", (off_t) file->f_pos, offset); } #endif end_coredump: close(fd); error(INFO,"write %s done...\n", filename); cleanup: free(elf); free(prstatus); free(psinfo); free(notes); free(fpu); #ifdef ELF_CORE_COPY_XFPREGS free(xfpu); #endif } char *help_elfdump[] = { "elfdump", /* command name */ "create process elf coredump by pid", /* short description */ "[-d] [-s] pid ", /* argument synopsis, or " " if none */ " This command simply create a elf coredump file of specified pid.", " Supported on x86 and x86_64 only.", " -d debug mode ", " -s dump shmem ", "\nEXAMPLE", " write pid's elf coredump image:\n", " crash> elfdump 1024", " write elfcore.1024 done...", " crash> ", " now you can debug above process by \"gdb binary elfcore.1024\" !", "\nBUG", " please do 'vm -p pid' before do elfdump , so you'll be happy! :)", NULL }; static struct command_table_entry command_table[] = { "elfdump", cmd_elfdump, help_elfdump, 0, /* One or more commands, */ NULL, /* terminated by NULL, */ }; _init() /* Register the command set. */ { register_extension(command_table); } /* * The _fini() function is called if the shared object is unloaded. * If desired, perform any cleanups here. */ _fini() { }
-- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility