Re: [PATCH v2 0/6] Improve visibility of writeback

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Wed, 3 Apr 2024 18:24:25 -0400

On Wed, Apr 03, 2024 at 09:21:37AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Wed, Apr 03, 2024 at 03:06:56PM -0400, Kent Overstreet wrote:
> ...
> > That's how it should be if you just make a point of making your internal
> > state easy to view and introspect, but when I'm debugging issues that
> > run into the wider block layer, or memory reclaim, we often hit a wall.
> 
> Try drgn:
> 
>   https://drgn.readthedocs.io/en/latest/
> 
> I've been adding drgn scripts under tools/ directory for introspection.
> They're easy to write, deploy and ask users to run.

Which is still inferior to simply writing to_text() functions for all
your objects and exposing them under sysfs/debugfs.

Plus, it's a whole new language/system for boths devs and users to
learn.

And having to_text() functions makes your log and error messages way
better.

"But what about code size/overhead?" - bullshit, we're talking about a
couple percent of .text for the code itself; we blow more memory on
permament dentries/inodes due to the way our virtual filesystems work
but that's more of a problem for tracefs.

> > Writeback throttling was buggy for _months_, no visibility or
> > introspection or concerns for debugging, and that's a small chunk of
> > code. io_uring - had to disable it. I _still_ have people bringing
> > issues to me that are clearly memory reclaim related but I don't have
> > the tools.
> > 
> > It's not like any of this code exports much in the way of useful
> > tracepoints either, but tracepoints often just aren't what you want;
> > what you want just to be able to see internal state (_without_ having to
> > use a debugger, because that's completely impractical outside highly
> > controlled environments) - and tracing is also never the first thing you
> > want to reach for when you have a user asking you "hey, this thing went
> > wonky, what's it doing?" - tracing automatically turns it into a multi
> > step process of decide what you want to look at, run the workload more
> > to collect data, iterate.
> > 
> > Think more about "what would make code easier to debug" and less about
> > "how do I shove this round peg through the square tracing/BPF slot".
> > There's _way_ more we could be doing that would just make our lives
> > easier.
> 
> Maybe it'd help classifying visibility into the the following categories:
> 
> 1. Current state introspection.
> 2. Dynamic behavior tracing.
> 3. Accumluative behavior profiling.
> 
> drgn is great for #1. Tracing and BPF stuff is great for #2 especially when
> things get complicated. #3 is the trickest. Static stuff is useful in a lot
> of cases but BPF can also be useful in other cases.
> 
> I agree that it's all about using the right tool for the problem.

Yeah, and you guys are all about the nerdiest and most overengineered
tools and ignoring the basics. Get the simple stuff right, /then/ if
there's stuff you still can't do, that's when you start looking at the
more complicated stuff.