On Mon, Aug 05, 2024 at 06:44:44PM +0000, Brian Mak wrote: > On Aug 5, 2024, at 10:25 AM, Kees Cook <kees@xxxxxxxxxx> wrote: > > > On Thu, Aug 01, 2024 at 05:58:06PM +0000, Brian Mak wrote: > >> On Jul 31, 2024, at 7:52 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > >>> One practical concern with this approach is that I think the ELF > >>> specification says that program headers should be written in memory > >>> order. So a comment on your testing to see if gdb or rr or any of > >>> the other debuggers that read core dumps cares would be appreciated. > >> > >> I've already tested readelf and gdb on core dumps (truncated and whole) > >> with this patch and it is able to read/use these core dumps in these > >> scenarios with a proper backtrace. > > > > Can you compare the "rr" selftest before/after the patch? They have been > > the most sensitive to changes to ELF, ptrace, seccomp, etc, so I've > > tried to double-check "user visible" changes with their tree. :) > > Hi Kees, > > Thanks for your reply! > > Can you please give me some more information on these self tests? > What/where are they? I'm not too familiar with rr. I start from where whenever I go through their tests: https://github.com/rr-debugger/rr/wiki/Building-And-Installing#tests > > And those VMAs weren't thread stacks? > > Admittedly, I did do all of this exploration months ago, and only have > my notes to go off of here, but no, they should not have been thread > stacks since I had pulled all of them in during a "first pass". Okay, cool. I suspect you'd already explored that, but I wanted to be sure we didn't have an "easy to explain" solution. ;) > > It does also feel like part of the overall problem is that systemd > > doesn't have a way to know the process is crashing, and then creates the > > truncation problem. (i.e. we're trying to use the kernel to work around > > a visibility issue in userspace.) > > Even if systemd had visibility into the fact that a crash is happening, > there's not much systemd can do in some circumstances. In applications > with strict time to recovery limits, the process needs to restart within > a certain time limit. We run into a similar issue as the issue I raised > in my last reply on this thread: to keep the core dump intact and > recover, we either need to start up a new process while the old one is > core dumping, or wait until core dumping is complete to restart. > > If we start up a new process while the old one is core dumping, we risk > system stability in applications with a large memory footprint since we > could run out of memory from the duplication of memory consumption. If > we wait until core dumping is complete to restart, we're in the same > scenario as before with the core being truncated or we miss recovery > time objectives by waiting too long. > > For this reason, I wouldn't say we're using the kernel to work around a > visibility issue or that systemd is creating the truncation problem, but > rather that the issue exists due to limitations in how we're truncating > cores. That being said, there might be some use in this type of > visibility for others with less strict recovery time objectives or > applications with a lower memory footprint. Yeah, this is interesting. This effectively makes the coredumping activity rather "critical path": the replacement process can't start until the dump has finished... hmm. It feels like there should be a way to move the dumping process aside, but with all the VMAs still live, I can see how this might go weird. I'll think some more about this... -- Kees Cook