On 07/04/2014 05:26 PM, Ondrej Oprala wrote: > On 07/03/2014 02:58 PM, Suzuki K. Poulose wrote: >> On 07/03/2014 06:06 PM, Ondrej Oprala wrote: >>> On 07/03/2014 12:30 PM, Suzuki K. Poulose wrote: >>>> On 05/29/2014 11:53 PM, Suzuki K. Poulose wrote: >>>>> On 05/29/2014 06:47 PM, Ondrej Oprala wrote: >>>>>> On 05/29/2014 02:45 PM, Suzuki K. Poulose wrote: >>>>>>> On 05/29/2014 05:16 PM, Ondrej Oprala wrote: >>>>>>>> On 05/29/2014 01:44 PM, Janani Venkataraman wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> We have developed a tool called "gencore" which captures the >>>>>>>>> core of >>>>>>>>> an application without >>>>>>>>> disrupting its process. The dump is collected non-disruptively and >>>>>>>>> this tool currently supports >>>>>>>>> s390, x86 and power systems. >>>>>>>>> >>>>>>>>> THE TOOL: >>>>>>>>> >>>>>>>>> The tool can perform non-disruptive third party dumps. The tool >>>>>>>>> also >>>>>>>>> contains a library "libgencore" >>>>>>>>> which helps applicationsto trigger self dumps. >>>>>>>>> >>>>>>>>> The tool can perform: >>>>>>>>> >>>>>>>>> 1) Third party dump: The pid of the process to dumped is given >>>>>>>>> along >>>>>>>>> with name of the core-file to >>>>>>>>> be created. >>>>>>>>> >>>>>>>>> eg. >>>>>>>>> >>>>>>>>> [janani@localhost]:gencore 6616 core.test >>>>>>>>> >>>>>>>>> 2) Self dump: The programs can request a self-dump using gencore() >>>>>>>>> API, provided throughlibgencore. This >>>>>>>>> is implemented through a daemon which listens on a UNIX >>>>>>>>> Filesocket for >>>>>>>>> such requests. The daemon is started >>>>>>>>> immediately post installation. The program which requires the dump >>>>>>>>> makes use of the gencore() API and provides >>>>>>>>> the name of the core-file as a parameter. >>>>>>>>> >>>>>>>>> eg. >>>>>>>>> >>>>>>>>> /* Opening the library, in this case the library is present in the >>>>>>>>> /usr/lib64 */ >>>>>>>>> lib = dlopen("libgencore.so", RTLD_LAZY); >>>>>>>>> >>>>>>>>> gencore = dlsym(lib, "gencore"); >>>>>>>>> >>>>>>>>> Call the API: >>>>>>>>> gencore("/home/janani/core_test"). >>>>>>>>> >>>>>>>>> BASIC IDEA: >>>>>>>>> >>>>>>>>> The basic idea is that the threads of the process are held using >>>>>>>>> ptrace calls and the dump is generated in the >>>>>>>>> ELF format using the /proc/pid filesystem. >>>>>>>>> >>>>>>>>> PATCH SET: >>>>>>>>> We have designed this tool based on the discussions with linux >>>>>>>>> kernel >>>>>>>>> community. The patches have been posted >>>>>>>>> at:https://lkml.org/lkml/2014/3/20/138 >>>>>>>>> >>>>>>>>> Do you think this can be part of the util-linux bundle? We can >>>>>>>>> tweak >>>>>>>>> it to make it work as a package in util-linux. >>>>>>>>> >>>>>>>>> Let us know your reviews and comments. >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> Janani >>>>>>>>> >>>>>>>>> -- >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe >>>>>>>>> util-linux" in >>>>>>>>> the body of a messagetomajordomo@xxxxxxxxxxxxxxx >>>>>>>>> More majordomo info athttp://vger.kernel.org/majordomo-info.html >>>>>>>> Interesting, >>>>>>>> but how is this different from attaching to a process with GDB and >>>>>>>> using >>>>>>>> the gcore command? Or to automate it more, using the gcore script >>>>>>>> that >>>>>>>> comes with GDB? >>>>>>>> Cheers, >>>>>>>> Ondrej >>>>>>>> >>>>>>> There are two major issues with that. >>>>>>> >>>>>>> 1) GDB uses PTRACE_ATTACH and hence the process gets a SIGSTOP. >>>>>> I fail to see the downside to that. >>>>>>> 2) A process cannot initiate the request to dump itself, say from a >>>>>>> signal handler. (since fork() is not signal safe) >>>>>> This should be possible using libgdb. Let's say forking while in a >>>>>> SIGSEGV >>>>>> handler and using the libgdb API to do the dump. >>>>> Thats exactly the problem. forking within a sighandler is not safe. >>>>> You >>>>> could possibly deadlock with glibc locks. >>>> Ondrej, >>>> >>>> What are your thoughts about this ? >>>> >>>> Thanks >>>> Suzuki >>>> >>> Hi Suzuki, >>> >>> from the LKML mailing list, I can see that the biggest >>> criticism/confusion >>> related to gencore comes from your necessity claims around the daemon >>> part. >> The daemon part was a shared philosophy from the CRIU project. There is >> no other reliable way of doing a self dump. > Yes, I think that you explained the problem with self-ptrace > clearly enough on the LKML. >>> I'm not entirely sure what kind of programs is gencore going to be most >>> used/useful for.. >> This can be used by huge applications, like, JAVA RUNTIME, to trigger a >> dump when it detects some issues, without actually bringing down the >> workload. > Well, on 64-bit archs, huge programs may eat up terabytes of > virtual memory, so normal dumps are sometimes close to impossible > (though I'd really like to stress-test gdb with a massive 1TB coredump). > Do you somehow get the process' VM size before dumping? > To limit the mappings to be dumped, for example... >>> but isn't the signalfd API solving the problem of async-signal safety? >>> Using it, you should be able to catch the signal, safely fork >>> and happily exec gencore. >> This imposes a lot of changes in the applications that may want to use >> the API and is prone to errors in attaining the same. > But see, now we've moved from "CAN'T be done in any other way" > to "CAN be done in other ways, although it might be non-trivial > for some projects". I'm not saying the daemon doesn't have its > usecases. I'm only trying to point out here, that there indeed ARE > other ways. >>> No need for any other daemon running. >>> >> The daemon doesn't add much overhead. With systemd, you could make use >> of the socket option to optimize the triggering of the gencore. > I still haven't had time to look at the code itself. Does the daemon > have to be running if I want to use the signalfd + fork + exec(gencore) > approach > mentioned above? Sorry, this one was lost in other emails. No we don't need a daemon if you can reliably invoke gencore Cheers Suzuki -- To unsubscribe from this list: send the line "unsubscribe util-linux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html