Re: [PATCH v3 09/19] unwind: Introduce sframe user space unwinding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+cc bpf@ where people care about stack traces and profiling as well
(please cc bpf@xxxxxxxxxxxxxxx for next revisions as well, I'm sure a
bunch of folks would appreciate it and have something useful to say)

On Thu, Oct 31, 2024 at 4:03 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>
> On Thu, Oct 31, 2024 at 01:57:10PM -0700, Andrii Nakryiko wrote:
> > > > what if ip_off is larger than 4GB? ELF section can be bigger than 4GB, right?
> > >
> > > That's baked into sframe v2.
> >
> > I believe we do have large production binaries with more than 4GB of
> > text, what are we going to do about them? It would be interesting to
> > hear sframe people's opinion. Adding such a far-reaching new format in
> > 2024 with these limitations is kind of sad. At the very least maybe we
> > should allow some form of chaining sframe definitions to cover more
> > than 4GB segments? Please CC relevant folks, I'm wondering what
> > they're thinking about this.
>
> Personally I find the idea of a single 4GB+ text segment pretty
> surprising as I've never seen anything even close to that.

I grabbed one of Meta production servers running one of the big-ish (I
don't know if it's even the largest, most probably not) service. Here
are first few sections from /proc/pid/maps belonging to the main
binary:

00200000-170ad000 r--p 00000000 07:01 5
172ac000-498e7000 r-xp 16eac000 07:01 5
49ae7000-49b8b000 r--p 494e7000 07:01 5
49d8b000-4a228000 rw-p 4958b000 07:01 5
4a228000-4c677000 rw-p 00000000 00:00 0
4c800000-4ca00000 r-xp 49c00000 07:01 5
4ca00000-4f600000 r-xp 49e00000 07:01 5
4f600000-5b270000 r-xp 4ca00000 07:01 5

Few observations:

1) There are 4 executable segments in just the first 8 entries.
2) Their total size is already approaching 1.5GB:

>>> ((0x170ad000 - 0x200000) + (0x5b270000 - 0x4f600000) + (0x498e7000 - 0x172ac000)) / 1024 / 1024
1361.34375

I don't know about you, but from my experience things like code size
tend to just grow over time, rarely it shrinks (and even that usually
requires tremendous and focused efforts).

>
> Anyway it's iterative development and not everybody's requirements are
> clear from day 1.  Which is why we're discussing it now.  I think there
> are already plans to do an sframe v3.

Of course, which is why I'm providing this feedback. But it would be
nice to avoid having to support a zoo of versions if we already know
there are practical limitations that we are not that far from hitting.

>
> > > > > +                       if (text_vma) {
> > > > > +                               pr_warn_once("%s[%d]: multiple EXEC segments unsupported\n",
> > > > > +                                            current->comm, current->pid);
> > > >
> > > > is this just something that fundamentally can't be supported by SFrame
> > > > format? Or just an implementation simplification?
> > >
> > > It's a simplification I suppose.
> >
> > That's a rather random limitation, IMO... How hard would it be to not
> > make that assumption?
>
> It's definitely not random, there's no need to complicate the code if
> this condition doesn't exist.

Sorry, I'm probably dense and missing something. But from the example
process above, isn't this check violated already? Or it's two
different things? Not sure, honestly.

>
> > > > It's not illegal to have an executable with multiple VM_EXEC segments,
> > > > no? Should this be a pr_warn_once() then?
> > >
> > > I don't know, is it allowed?  I've never seen it in practice.  The
> >
> > I'm pretty sure you can do that with a custom linker script, at the
> > very least. Normally this probably won't happen, but I don't think
> > Linux dictates how many executable VMAs an application can have.
> > And it probably just naturally happens for JIT-ted applications (Java,
> > Go, etc).
>
> Actually I just double checked and even the kernel's ELF loader assumes
> that each executable has only a single text start+end address pair.

See above, very confused by such assumptions, but I'm hoping we are
talking about two different things here.

>
> > > pr_warn_once() is not reporting that it's illegal but rather that this
> > > corner case actually exists and maybe needs to be looked at.
> >
> > This warn() will be logged across millions of machines in the fleet,
> > triggering alarms, people looking at this, making custom internal
> > patches to disable the known-to-happen warn. Why do we need all this?
> > This is an issue that is trivial to trigger by user process that's not
> > doing anything illegal. Why?
>
> There's no point in adding complexity to support some hypothetical.  I
> can remove the printk though.

We are talking about fundamental things like format for supporting
frame pointer-less stack trace capture. It will take years to adopt
SFrame everywhere, so I think it's prudent to think a bit ahead beyond
just saying "no real application should need more than 4GB text", IMO.

>
> --
> Josh





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux