Hello, On Wed, Apr 03, 2024 at 03:06:56PM -0400, Kent Overstreet wrote: ... > That's how it should be if you just make a point of making your internal > state easy to view and introspect, but when I'm debugging issues that > run into the wider block layer, or memory reclaim, we often hit a wall. Try drgn: https://drgn.readthedocs.io/en/latest/ I've been adding drgn scripts under tools/ directory for introspection. They're easy to write, deploy and ask users to run. > Writeback throttling was buggy for _months_, no visibility or > introspection or concerns for debugging, and that's a small chunk of > code. io_uring - had to disable it. I _still_ have people bringing > issues to me that are clearly memory reclaim related but I don't have > the tools. > > It's not like any of this code exports much in the way of useful > tracepoints either, but tracepoints often just aren't what you want; > what you want just to be able to see internal state (_without_ having to > use a debugger, because that's completely impractical outside highly > controlled environments) - and tracing is also never the first thing you > want to reach for when you have a user asking you "hey, this thing went > wonky, what's it doing?" - tracing automatically turns it into a multi > step process of decide what you want to look at, run the workload more > to collect data, iterate. > > Think more about "what would make code easier to debug" and less about > "how do I shove this round peg through the square tracing/BPF slot". > There's _way_ more we could be doing that would just make our lives > easier. Maybe it'd help classifying visibility into the the following categories: 1. Current state introspection. 2. Dynamic behavior tracing. 3. Accumluative behavior profiling. drgn is great for #1. Tracing and BPF stuff is great for #2 especially when things get complicated. #3 is the trickest. Static stuff is useful in a lot of cases but BPF can also be useful in other cases. I agree that it's all about using the right tool for the problem. -- tejun