[PATCH 00/33] [RFC] Non disruptive application core dump infrastructure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

The following series implements an infrastructure for capturing the core of an
application without disrupting its process.

Kernel Space Approach:

1) Posted an RFD to LKML explaining the various kernel-methods being analysed.

https://lkml.org/lkml/2013/9/3/122

2) Went ahead to implement the same using the task_work_add approach and posted an
RFC to LKML.

http://lwn.net/Articles/569534/

Based on the responses, the present approach implements the same in User-Space.

User Space Approach:

We didn't adopt the CRIU approach because our method would give us a head
start, as all that the distro would need is the PTRACE_functionality and nothing
more which is available from kernel versions 3.4 and above.

Basic Idea of User Space:

1) The threads are held using PTRACE_SEIZE and PTRACE_INTERRUPT.

2) The dump is then taken using the following:
    1) The register sets namely general purpose, floating point and the arch
    specific register sets are collected through PTRACE_GETREGSET calls by
    passing the appropriate register type as parameter.
    2) The virtual memory maps are collected from /proc/pid/maps.
    3) The auxiliary vector is collected from /proc/pid/auxv.
    4) Process state information for filling the notes such as PRSTATUS and
    PRPSINFO are collected from /proc/pid/stat and /proc/pid/status.
    5) The actual memory is read through process_vm_readv syscall as suggested
    by Andi Kleen.
    6) Command line arguments are collected from /proc/pid/cmdline

3) The threads are then released using PTRACE_DETACH.

Self Dump:

A self dump is implemented with the following approach which was adapted
from CRIU:

Gencore Daemon

The programs can request a dump using gencore() API, provided through
libgencore. This is implemented through a daemon which listens on a UNIX File
socket. The daemon is started immediately post installation.

We have provided service scripts for integration with systemd.

NOTE:

On systems with systemd, we could make use of socket option, which will avoid
the need for running the gencore daemon always. The systemd can wait on the
socket for requests and trigger the daemon as and when required. However, since
the systemd socket APIs are not exported yet, we have disabled the supporting
code for this feature.

libgencore:

1) The client interface is a standard library call. All that the dump requester
does is open the library and call the gencore() API and the dump will be
generated in the path specified(relative/absolute).

To Do:

1) Presently we wait indefinitely for the all the threads to seize. We can add
a time-out to decide how much time we need to wait for the threads to be
seized. This can be passed as command line argument in the case of a third
party dump and in the case of the self-dump through the library call. We need
to work on how much time to wait.

2) Like mentioned before, the systemd socket APIs are not exported yet and
hence this option is disabled now. Once these API's are available we can enable
the socket option.

We would like to push this to one of the following packages:
a) util-linux
b) coreutils
c) procps-ng

We are not sure which one would suit this application the best.
Please let us know your views on the same.

Patches 1 - 16 implements the dump generation.

Patches 17 - 24 implements the daemon approach.

Patch 25 implements the systemd socket approach.

Patches 26-27 implements the client-interface library.

Patches 28-33 handles the building and other packaging aspects.

Please let us know your reviews and comments.

Thanks.

Janani Venkataraman (33):
      Configure and Make files
      Validity of arguments
      Process Status
      Hold threads
      Fetching Memory maps
      Check ELF class
      Do elf_coredump
      Fills elf header
      Adding notes infrastructure
      Populates PRPS info
      Populate AUXV
      Fetch File maps
      Fetching thread specific Notes
      Populating Program Headers
      Updating Offset
      Writing to core file
      Daemonizing the Process
      Socket operations
      Block till request
      Handling Requests
      Get Clients PID
      Dump the task
      Handling SIG TERM of the daemon
      Handling SIG TERM of the child
      Systemd Socket ID retrieval
      [libgencore] Setting up Connection
      [libgencore] Request for dump
      Man pages
      Automake files for the doc folder
      README, COPYING, Changelog
      Spec file
      Socket and Service files.
      Support check


 COPYING            |   24 ++
 COPYING.LIBGENCORE |   24 ++
 Changelog          |    7 
 Makefile.am        |   22 +
 README             |  108 +++++++
 configure.ac       |    8 +
 doc/Makefile.am    |    2 
 doc/gencore.1      |   31 ++
 doc/gencore.3      |   28 ++
 gencore.service    |    9 +
 gencore.socket     |   10 +
 gencore.spec.in    |   88 ++++++
 gencore@.service   |    9 +
 libgencore.pc.in   |    8 +
 src/Makefile.am    |   13 +
 src/client.c       |  121 ++++++++
 src/coredump.c     |  764 ++++++++++++++++++++++++++++++++++++++++++++++++
 src/coredump.h     |   74 +++++
 src/elf-compat.h   |  124 ++++++++
 src/elf.c          |  827 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/elf32.c        |   43 +++
 src/elf64.c        |   44 +++
 src/gencore.h      |    1 
 src/proc.c         |  278 +++++++++++++++++
 24 files changed, 2667 insertions(+)
 create mode 100644 COPYING
 create mode 100644 COPYING.LIBGENCORE
 create mode 100644 Changelog
 create mode 100644 README
 create mode 100644 doc/Makefile.am
 create mode 100644 doc/gencore.1
 create mode 100644 doc/gencore.3
 create mode 100644 gencore.service
 create mode 100644 gencore.socket
 create mode 100644 gencore.spec.in
 create mode 100644 gencore@.service
 create mode 100644 libgencore.pc.in
 create mode 100644 src/Makefile.am
 create mode 100644 src/client.c
 create mode 100644 src/coredump.c
 create mode 100644 src/coredump.h
 create mode 100644 src/elf-compat.h
 create mode 100644 src/elf.c
 create mode 100644 src/elf32.c
 create mode 100644 src/elf64.c
 create mode 100644 src/gencore.h
 create mode 100644 src/proc.c

-- 
Janani

--
To unsubscribe from this list: send the line "unsubscribe util-linux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux