+cc bpf@ where people care about stack traces and profiling as well (please cc bpf@xxxxxxxxxxxxxxx for next revisions as well, I'm sure a bunch of folks would appreciate it and have something useful to say) On Thu, Oct 31, 2024 at 4:03 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: > > On Thu, Oct 31, 2024 at 01:57:10PM -0700, Andrii Nakryiko wrote: > > > > what if ip_off is larger than 4GB? ELF section can be bigger than 4GB, right? > > > > > > That's baked into sframe v2. > > > > I believe we do have large production binaries with more than 4GB of > > text, what are we going to do about them? It would be interesting to > > hear sframe people's opinion. Adding such a far-reaching new format in > > 2024 with these limitations is kind of sad. At the very least maybe we > > should allow some form of chaining sframe definitions to cover more > > than 4GB segments? Please CC relevant folks, I'm wondering what > > they're thinking about this. > > Personally I find the idea of a single 4GB+ text segment pretty > surprising as I've never seen anything even close to that. I grabbed one of Meta production servers running one of the big-ish (I don't know if it's even the largest, most probably not) service. Here are first few sections from /proc/pid/maps belonging to the main binary: 00200000-170ad000 r--p 00000000 07:01 5 172ac000-498e7000 r-xp 16eac000 07:01 5 49ae7000-49b8b000 r--p 494e7000 07:01 5 49d8b000-4a228000 rw-p 4958b000 07:01 5 4a228000-4c677000 rw-p 00000000 00:00 0 4c800000-4ca00000 r-xp 49c00000 07:01 5 4ca00000-4f600000 r-xp 49e00000 07:01 5 4f600000-5b270000 r-xp 4ca00000 07:01 5 Few observations: 1) There are 4 executable segments in just the first 8 entries. 2) Their total size is already approaching 1.5GB: >>> ((0x170ad000 - 0x200000) + (0x5b270000 - 0x4f600000) + (0x498e7000 - 0x172ac000)) / 1024 / 1024 1361.34375 I don't know about you, but from my experience things like code size tend to just grow over time, rarely it shrinks (and even that usually requires tremendous and focused efforts). > > Anyway it's iterative development and not everybody's requirements are > clear from day 1. Which is why we're discussing it now. I think there > are already plans to do an sframe v3. Of course, which is why I'm providing this feedback. But it would be nice to avoid having to support a zoo of versions if we already know there are practical limitations that we are not that far from hitting. > > > > > > + if (text_vma) { > > > > > + pr_warn_once("%s[%d]: multiple EXEC segments unsupported\n", > > > > > + current->comm, current->pid); > > > > > > > > is this just something that fundamentally can't be supported by SFrame > > > > format? Or just an implementation simplification? > > > > > > It's a simplification I suppose. > > > > That's a rather random limitation, IMO... How hard would it be to not > > make that assumption? > > It's definitely not random, there's no need to complicate the code if > this condition doesn't exist. Sorry, I'm probably dense and missing something. But from the example process above, isn't this check violated already? Or it's two different things? Not sure, honestly. > > > > > It's not illegal to have an executable with multiple VM_EXEC segments, > > > > no? Should this be a pr_warn_once() then? > > > > > > I don't know, is it allowed? I've never seen it in practice. The > > > > I'm pretty sure you can do that with a custom linker script, at the > > very least. Normally this probably won't happen, but I don't think > > Linux dictates how many executable VMAs an application can have. > > And it probably just naturally happens for JIT-ted applications (Java, > > Go, etc). > > Actually I just double checked and even the kernel's ELF loader assumes > that each executable has only a single text start+end address pair. See above, very confused by such assumptions, but I'm hoping we are talking about two different things here. > > > > pr_warn_once() is not reporting that it's illegal but rather that this > > > corner case actually exists and maybe needs to be looked at. > > > > This warn() will be logged across millions of machines in the fleet, > > triggering alarms, people looking at this, making custom internal > > patches to disable the known-to-happen warn. Why do we need all this? > > This is an issue that is trivial to trigger by user process that's not > > doing anything illegal. Why? > > There's no point in adding complexity to support some hypothetical. I > can remove the printk though. We are talking about fundamental things like format for supporting frame pointer-less stack trace capture. It will take years to adopt SFrame everywhere, so I think it's prudent to think a bit ahead beyond just saying "no real application should need more than 4GB text", IMO. > > -- > Josh