Hi all, The following series implements an infrastructure for capturing the core of an application without disrupting its process. Kernel Space Approach: 1) Posted an RFD to LKML explaining the various kernel-methods being analysed. https://lkml.org/lkml/2013/9/3/122 2) Went ahead to implement the same using the task_work_add approach and posted an RFC to LKML. http://lwn.net/Articles/569534/ Based on the responses, the present approach implements the same in User-Space. User Space Approach: We didn't adopt the CRIU approach because our method would give us a head start, as all that the distro would need is the PTRACE_functionality and nothing more which is available from kernel versions 3.4 and above. Basic Idea of User Space: 1) The threads are held using PTRACE_SEIZE and PTRACE_INTERRUPT. 2) The dump is then taken using the following: 1) The register sets namely general purpose, floating point and the arch specific register sets are collected through PTRACE_GETREGSET calls by passing the appropriate register type as parameter. 2) The virtual memory maps are collected from /proc/pid/maps. 3) The auxiliary vector is collected from /proc/pid/auxv. 4) Process state information for filling the notes such as PRSTATUS and PRPSINFO are collected from /proc/pid/stat and /proc/pid/status. 5) The actual memory is read through process_vm_readv syscall as suggested by Andi Kleen. 6) Command line arguments are collected from /proc/pid/cmdline 3) The threads are then released using PTRACE_DETACH. Self Dump: A self dump is implemented with the following approach which was adapted from CRIU: Gencore Daemon The programs can request a dump using gencore() API, provided through libgencore. This is implemented through a daemon which listens on a UNIX File socket. The daemon is started immediately post installation. We have provided service scripts for integration with systemd. NOTE: On systems with systemd, we could make use of socket option, which will avoid the need for running the gencore daemon always. The systemd can wait on the socket for requests and trigger the daemon as and when required. However, since the systemd socket APIs are not exported yet, we have disabled the supporting code for this feature. libgencore: 1) The client interface is a standard library call. All that the dump requester does is open the library and call the gencore() API and the dump will be generated in the path specified(relative/absolute). To Do: 1) Presently we wait indefinitely for the all the threads to seize. We can add a time-out to decide how much time we need to wait for the threads to be seized. This can be passed as command line argument in the case of a third party dump and in the case of the self-dump through the library call. We need to work on how much time to wait. 2) Like mentioned before, the systemd socket APIs are not exported yet and hence this option is disabled now. Once these API's are available we can enable the socket option. We would like to push this to one of the following packages: a) util-linux b) coreutils c) procps-ng We are not sure which one would suit this application the best. Please let us know your views on the same. Patches 1 - 16 implements the dump generation. Patches 17 - 24 implements the daemon approach. Patch 25 implements the systemd socket approach. Patches 26-27 implements the client-interface library. Patches 28-33 handles the building and other packaging aspects. Please let us know your reviews and comments. Thanks. Janani Venkataraman (33): Configure and Make files Validity of arguments Process Status Hold threads Fetching Memory maps Check ELF class Do elf_coredump Fills elf header Adding notes infrastructure Populates PRPS info Populate AUXV Fetch File maps Fetching thread specific Notes Populating Program Headers Updating Offset Writing to core file Daemonizing the Process Socket operations Block till request Handling Requests Get Clients PID Dump the task Handling SIG TERM of the daemon Handling SIG TERM of the child Systemd Socket ID retrieval [libgencore] Setting up Connection [libgencore] Request for dump Man pages Automake files for the doc folder README, COPYING, Changelog Spec file Socket and Service files. Support check COPYING | 24 ++ COPYING.LIBGENCORE | 24 ++ Changelog | 7 Makefile.am | 22 + README | 108 +++++++ configure.ac | 8 + doc/Makefile.am | 2 doc/gencore.1 | 31 ++ doc/gencore.3 | 28 ++ gencore.service | 9 + gencore.socket | 10 + gencore.spec.in | 88 ++++++ gencore@.service | 9 + libgencore.pc.in | 8 + src/Makefile.am | 13 + src/client.c | 121 ++++++++ src/coredump.c | 764 ++++++++++++++++++++++++++++++++++++++++++++++++ src/coredump.h | 74 +++++ src/elf-compat.h | 124 ++++++++ src/elf.c | 827 ++++++++++++++++++++++++++++++++++++++++++++++++++++ src/elf32.c | 43 +++ src/elf64.c | 44 +++ src/gencore.h | 1 src/proc.c | 278 +++++++++++++++++ 24 files changed, 2667 insertions(+) create mode 100644 COPYING create mode 100644 COPYING.LIBGENCORE create mode 100644 Changelog create mode 100644 README create mode 100644 doc/Makefile.am create mode 100644 doc/gencore.1 create mode 100644 doc/gencore.3 create mode 100644 gencore.service create mode 100644 gencore.socket create mode 100644 gencore.spec.in create mode 100644 gencore@.service create mode 100644 libgencore.pc.in create mode 100644 src/Makefile.am create mode 100644 src/client.c create mode 100644 src/coredump.c create mode 100644 src/coredump.h create mode 100644 src/elf-compat.h create mode 100644 src/elf.c create mode 100644 src/elf32.c create mode 100644 src/elf64.c create mode 100644 src/gencore.h create mode 100644 src/proc.c -- Janani -- To unsubscribe from this list: send the line "unsubscribe util-linux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html