Re: [PATCH v2 0/6] Improve visibility of writeback

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Wed, 3 Apr 2024 15:06:56 -0400

On Wed, Apr 03, 2024 at 08:44:05AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Wed, Apr 03, 2024 at 06:27:16PM +0200, Jan Kara wrote:
> > Yeah, BPF is great and I use it but to fill in some cases from practice,
> > there are sysadmins refusing to install bcc or run your BPF scripts on
> > their systems due to company regulations, their personal fear, or whatever.
> > So debugging with what you can achieve from a shell is still the thing
> > quite often.
> 
> Yeah, I mean, this happens with anything new. Tracing itself took quite a
> while to be adopted widely. BPF, bcc, bpftrace are all still pretty new and
> it's likely that the adoption line will keep shifting for quite a while.
> Besides, even with all the new gizmos there definitely are cases where good
> ol' cat interface makes sense.
> 
> So, if the static interface makes sense, we add it but we should keep in
> mind that the trade-offs for adding such static infrastructure, especially
> for the ones which aren't *widely* useful, are rather quickly shfiting in
> the less favorable direction.

A lot of our static debug infrastructure isn't that useful because it
just sucks.

Every time I hit a sysfs or procfs file that's just a single integer,
and nothing else, when clearly there's internal structure and
description that needs to be there I die a little inside. It's lazy and
amateurish.

I regularly debug things in bcachefs over IRC in about 5-10 minutes of
asking to check various files and pastebin them - this is my normal
process, I pretty much never have to ssh and touch the actual machines.

That's how it should be if you just make a point of making your internal
state easy to view and introspect, but when I'm debugging issues that
run into the wider block layer, or memory reclaim, we often hit a wall.

Writeback throttling was buggy for _months_, no visibility or
introspection or concerns for debugging, and that's a small chunk of
code. io_uring - had to disable it. I _still_ have people bringing
issues to me that are clearly memory reclaim related but I don't have
the tools.

It's not like any of this code exports much in the way of useful
tracepoints either, but tracepoints often just aren't what you want;
what you want just to be able to see internal state (_without_ having to
use a debugger, because that's completely impractical outside highly
controlled environments) - and tracing is also never the first thing you
want to reach for when you have a user asking you "hey, this thing went
wonky, what's it doing?" - tracing automatically turns it into a multi
step process of decide what you want to look at, run the workload more
to collect data, iterate.

Think more about "what would make code easier to debug" and less about
"how do I shove this round peg through the square tracing/BPF slot".
There's _way_ more we could be doing that would just make our lives
easier.