Re: [PATCH v8 07/18] nfsd: add "localio" support

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Fri, 28 Jun 2024 14:40:12 +0000

> On Jun 27, 2024, at 11:35 PM, NeilBrown <neilb@xxxxxxx> wrote:
> On Fri, 28 Jun 2024, Chuck Lever III wrote:
>> 
>>> On Jun 27, 2024, at 1:27 PM, Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
>>> On Thu, Jun 27, 2024 at 12:07:03PM -0400, Chuck Lever wrote:
>>>> On Thu, Jun 27, 2024 at 11:48:09AM -0400, Jeff Layton wrote:
>>>>> 
>>>>> Chuck mentioned this earlier, but I don't think we ought to merge the
>>>>> dprintks. If they're useful for debugging then they should be turned
>>>>> into tracepoints. This one, I'd probably just drop.
>>>> 
>>>> Right; the problem with dprintk() is they can create so much chatter
>>>> that the systemd journal will automatically toss those messages and
>>>> they are lost. No diagnostic value in that.
>>>> 
>>>> Also you probably won't find it useful if lots of debugging info
>>>> goes into the trace log but a handful of the stuff you are
>>>> looking for is dumped into the system journal; the two use different
>>>> timestamps and so are really hard to line up after the fact.
>>>> 
>>>> We're trying to transition away from dprintk() in NFSD for these
>>>> reasons.
>>> 
>>> OK, I understand wanting to not allow new dprintk() to be added.
>>> 
>>> Meanwhile:
>>> $ grep -ri dprintk fs/nfsd/*.[ch]  | wc -l
>>>    181
>>> 
>>> So I'm struggling to get motivated to convert to tracepoints.  Feels
>>> like a needless make-work hurdle, these could be converted by others
>>> more proficient with tracepoints if/when needed.
>>> 
>>> Making everyone have to be proficient at developing debugging via
>>> tracepoints seems misplaced (but I also understand that forcing others
>>> to fish enables "others" to not be doing the conversion work).
>> 
>> Trace points are part of the cost of contributing to NFSD,
>> just like XDR, RCU, lockdep_assert, and dozens of other
>> kernel facilities. Not a hurdle, and I don't ask for busy
>> work changes.
> 
> I think trace points are quite different from the other facilities you
> highlight.
> You need to know XDR and RCU etc to get correct performant code.  If you
> get it wrong, then the code won't work or (hopefully) a reviewer will
> tell you.
> 
> But trace points .... when and where are they really useful?  The answer
> to that question is no where near as clear cut.

I disagree; see below.

> While I'm sure they can be useful, I rarely find them to be so.  I've
> certainly had a few positive experiences, but also seen a lot of noise
> that doesn't really help me with the particular behaviour that I'm
> trying the analyse.  system-tap can be incredibly useful as it is
> targeted.  Fixed trace points are (for me) only occasionally useful.

Some of Oracle's customers, for example, refuse to use out-of-band
debugging facilities like BPF or systemtap because that requires
bespoke case-specific code to be written. They feel that enabling
any lightly-tested code at a kernel privilege level on heavily-used
production systems introduces an unacceptable risk of crashing such
systems. (I'm told by Red Hat support engineers that they have
heard the same concerns).

dprintk impacts thread timing and has a heavy performance penalty.
It can also run the root file system out of space, thus it's not
something that can be left enabled for long periods of time. It
has no mechanisms for data reduction during capture. So it's
simply not a viable player in most live debugging scenarios.

If you prefer systemtap or BPF, you are still free to use those
instead! However, built-in tracing is the only choice for the
above cases, and it has to be part of the source code.

> I think it would be good to know if localio is active - maybe something
> in /proc/self/mountinfo could provide that.
> I think it might be useful to know what server-uuid each server and each
> mount was using.  The client could again have it in
> /proc/self/mountinfo.  The server ...  maybe in /proc/fs/nfsd/, maybe
> available over netlink...

Netlink is where we are adding such things these days.

> just fyi, the most valuable part of the dprintk debugging in my
> experience is the rpc_show_tasks() call when rpc debugging is turned on
> or off.  This view into the current status can be very useful.

NFSD now has a similar facility via netlink.

Note also that the client's "show tasks" mechanism can also be
accessed via /sys.

--
Chuck Lever