Re: [PATCH v2 0/6] Improve visibility of writeback

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Thu, 28 Mar 2024 15:55:32 -0400

On Thu, Mar 28, 2024 at 09:46:39AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Thu, Mar 28, 2024 at 03:40:02PM -0400, Kent Overstreet wrote:
> > Collecting latency numbers at various key places is _enormously_ useful.
> > The hard part is deciding where it's useful to collect; that requires
> > intimate knowledge of the code. Once you're defining those collection
> > poitns statically, doing it with BPF is just another useless layer of
> > indirection.
> 
> Given how much flexibility helps with debugging, claiming it useless is a
> stretch.

Well, what would it add?

> > The time stats stuff I wrote is _really_ cheap, and you really want this
> > stuff always on so that you've actually got the data you need when
> > you're bughunting.
> 
> For some stats and some use cases, always being available is useful and
> building fixed infra for them makes sense. For other stats and other use
> cases, flexibility is pretty useful too (e.g. what if you want percentile
> distribution which is filtered by some criteria?). They aren't mutually
> exclusive and I'm not sure bdi wb instrumentation is on top of enough
> people's minds.
> 
> As for overhead, BPF instrumentation can be _really_ cheap too. We often run
> these programs per packet.

The main things I want are just
 - elapsed time since last writeback IO completed, so we can see at a
   glance if it's stalled
 - time stats on writeback io initiation to completion

The main value of this one will be tracking down tail latency issues and
finding out where in the stack they originate.