Re: [PATCH v3 00/35] Memory allocation profiling

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Tue, 13 Feb 2024 18:24:03 -0500

On Tue, Feb 13, 2024 at 03:11:15PM -0800, Darrick J. Wong wrote:
> On Tue, Feb 13, 2024 at 05:29:03PM -0500, Kent Overstreet wrote:
> > On Tue, Feb 13, 2024 at 11:17:32PM +0100, David Hildenbrand wrote:
> > > On 13.02.24 23:09, Kent Overstreet wrote:
> > > > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
> > > > > On 13.02.24 22:58, Suren Baghdasaryan wrote:
> > > > > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
> > > > > > > 
> > > > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
> > > > > > > [...]
> > > > > > > > We're aiming to get this in the next merge window, for 6.9. The feedback
> > > > > > > > we've gotten has been that even out of tree this patchset has already
> > > > > > > > been useful, and there's a significant amount of other work gated on the
> > > > > > > > code tagging functionality included in this patchset [2].
> > > > > > > 
> > > > > > > I suspect it will not come as a surprise that I really dislike the
> > > > > > > implementation proposed here. I will not repeat my arguments, I have
> > > > > > > done so on several occasions already.
> > > > > > > 
> > > > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe
> > > > > > > this debugging feature will add a maintenance overhead for a very long
> > > > > > > time. I can live with all the downsides of the proposed implementation
> > > > > > > _as long as_ there is a wider agreement from the MM community as this is
> > > > > > > where the maintenance cost will be payed. So far I have not seen (m)any
> > > > > > > acks by MM developers so aiming into the next merge window is more than
> > > > > > > little rushed.
> > > > > >
> > > > > > We tried other previously proposed approaches and all have their
> > > > > > downsides without making maintenance much easier. Your position is
> > > > > > understandable and I think it's fair. Let's see if others see more
> > > > > > benefit than cost here.
> > > > > 
> > > > > Would it make sense to discuss that at LSF/MM once again, especially
> > > > > covering why proposed alternatives did not work out? LSF/MM is not "too far"
> > > > > away (May).
> 
> You want to stall this effort for *three months* to schedule a meeting?
> 
> I would love to have better profiling of memory allocations inside XFS
> so that we can answer questions like "What's going on with these
> allocation stalls?" or "What memory is getting used, and where?" more
> quickly than we can now.
> 
> Right now I get to stare at tracepoints and printk crap until my eyes
> bleed, and maybe dchinner comes to my rescue and figures out what's
> going on sooner than I do.  More often we just never figure it out
> because only the customer can reproduce the problem, the reams of data
> produced by ftrace is unmanageable, and BPF isn't always available.
> 
> I'm not thrilled by the large increase in macro crap in the allocation
> paths, but I don't know of a better way to instrument things.  Our
> attempts to use _RET_IP in XFS to instrument its code paths have never
> worked quite right w.r.t. inlined functions and whatnot.
> 
> > > > > I recall that the last LSF/MM session on this topic was a bit unfortunate
> > > > > (IMHO not as productive as it could have been). Maybe we can finally reach a
> > > > > consensus on this.
> 
> From my outsider's perspective, nine months have gone by since the last
> LSF.  Who has come up with a cleaner/better/faster way to do what Suren
> and Kent have done?  Were those code changes integrated into this
> patchset?  Or why not?
> 
> Most of what I saw in 2023 involved compiler changes (great; now I have
> to wait until RHEL 11/Debian 14 to use it) and/or still utilize fugly
> macros.
> 
> Recalling all the way back to suggestions made in 2022, who wrote the
> prototype for doing this via ftrace?  Or BPF?  How well did that go for
> counting allocation events and the like?  I saw Tejun saying something
> about how they use BPF aggressively inside Meta, but that was about it.
> 
> Were any of those solutions significantly better than what's in front of
> us here?
> 
> I get it, a giant patch forcing everyone to know the difference between
> alloc_foo and alloc_foo_noperf offends my (yours?) stylistic
> sensibilities.  On the other side, making analysis easier during
> customer escalations means we kernel people get data, answers, and
> solutions sooner instead of watching all our time get eaten up on L4
> support and backporting hell.
> 
> > > > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
> > > > need to see a serious proposl - what we had at the last LSF was people
> > > > jumping in with half baked alternative proposals that very much hadn't
> > > > been thought through, and I see no need to repeat that.
> > > > 
> > > > Like I mentioned, there's other work gated on this patchset; if people
> > > > want to hold this up for more discussion they better be putting forth
> > > > something to discuss.
> > > 
> > > I'm thinking of ways on how to achieve Michal's request: "as long as there
> > > is a wider agreement from the MM community". If we can achieve that without
> > > LSF, great! (a bi-weekly MM meeting might also be an option)
> > 
> > A meeting wouldn't be out of the question, _if_ there is an agenda, but:
> > 
> > What's that coffeee mug say? I just survived another meeting that
> > could've been an email?
> 
> I congratulate you on your memory of my kitchen mug.  Yes, that's what
> it says.
> 
> > What exactly is the outcome we're looking for?
> > 
> > Is there info that people are looking for? I think we summed things up
> > pretty well in the cover letter; if there are specifics that people
> > want to discuss, that's why we emailed the series out.
> > 
> > There's people in this thread who've used this patchset in production
> > and diagnosed real issues (gigabytes of memory gone missing, I heard the
> > other day); I'm personally looking for them to chime in on this thread
> > (Johannes, Pasha).
> > 
> > If it's just grumbling about "maintenance overhead" we need to get past
> > - well, people are going to have to accept that we can't deliver
> > features without writing code, and I'm confident that the hooking in
> > particular is about as clean as it's going to get, _regardless_ of
> > toolchain support; and moreover it addresses what's been historically a
> > pretty gaping hole in our ability to profile and understand the code we
> > write.
> 
> Are you and Suren willing to pay whatever maintenance overhead there is?

I'm still wondering what this supposed "maintenance overhead" is going
to be...

As I use this patch series I occasionally notice places where a bunch of
memory is being accounted to one line of code, and it would better be
accounted to a caller - but then it's just a couple lines of code to fix
that. You switch that callsite to the _noprof() version of whatever
allocation it's doing, then add an alloc_hooks() wrapper at the place
you do want it accounted.

That doesn't really feel like overhead to me, just the normal tweaking
your tools to get the most out of them.

I will continue to do that for the code I'm looking at, yes.

If other people are doing that too, it'll be because they're also using
memory allocation profiling and finding it valuable.

I did notice earlier that we're still lacking documentation in the
Documentation/ directory; the workflow for "how you shift accounting to
the right spot" is something that should go in there.