Re: [PATCH] NFSD: remove redundant dprintk in exp_rootfh

Chuck Lever <chuck.lever@xxxxxxxxxx> · Mon, 8 Apr 2024 22:08:16 -0400

On Tue, Apr 09, 2024 at 11:12:12AM +1000, NeilBrown wrote:
> On Tue, 09 Apr 2024, Chuck Lever wrote:
> > On Tue, Apr 09, 2024 at 09:21:54AM +1000, NeilBrown wrote:
> > > On Tue, 09 Apr 2024, Chen Hanxiao wrote:
> > > > trace_nfsd_ctl_filehandle in write_filehandle has
> > > > some similar infos.
> > > 
> > > Not all that similar.  The dprintk you are removing includes the inode
> > > number and sb->s_id which the trace point don't include.
> > > 
> > > Why do you think that information isn't needed?
> > 
> > I asked him to remove that dprintk.
> > 
> > Can you say why you think that information is useful to provide
> > via a dprintk? It doesn't seem useful to system administrators,
> > IMO.
> 
> I'm not saying it is useful, but I don't think the onus is on me.

No the onus isn't on you. I'm merely asking for feedback (and I
explain why below).

> When removing tracing information, the commit message should at least
> acknowledge what is being removed and make some attempt to justify it.
> In this case the commit message claimed that nothing was being removed,
> which is clearly false.  Maybe just the commit message needs to be
> fixed.

Agreed, the patch description could have more detail and a proper
justification.

> I don't think these tracepoints are just for system administrators.
> They are also for tech support when they are trying to remotely diagnose
> a customer problem.  It is really hard to know what will actually be
> useful.  Many times I have found that the particular information that I
> want isn't available in any tracing.  I expect this is inevitable.  We
> cannot trace EVERYTHING so there will always be gaps.

When there is no in-code tracepoint available, the usual course of
action these days is to wheel out BPF, systemtap, or drgn. Distro
support engineers know how to do that. Server administrators might
not be so well trained, so I consider dprintk() of primary
importance to them.

Here the dprintk() is reporting information that seems useful only
to kernel developers. That's a code smell IMO. And the guidance in
these cases, historically, has been to remove such observability
either before a patch is merged, or after the fact, as we're doing
here.

> But removing some information that was previously generated seems
> like a regression that should at least be justified.
> In the case of write_filehandle() we are now tracing the request, but
> not the result.  Is this generally sensible?

Let's instead look at the specific situation. The purpose of the
nfsd_ctl_* tracepoints is to record in the trace log when 
configuration changes are made, in order to juxtapose those with
other server operation.

So, here, it's quite sensible. We want to observe the information
that was passed from user space, and the starting timestamp.

> Would it not make sense to
> trace both after the core of the function (exp_rootfh) has completed?

In some cases there are already tracepoints that would report or
infer the new state, so reporting a result would be redundant in
those cases.

> At lease the knfsd_fh_hash() of the generated filehandle could be
> reported.

Chen's original patch replaced the dprintk with a tracepoint. So I
asked, a week or so ago, for exactly this kind of feedback. There
have been no responses until now. Therefore it seemed logical to
assume no-one had a use for this info.

The folks whom this information would serve have been silent to date
with any specific suggestions, and usually we hear from someone
quickly when removing observability that is depended upon.

We could record the FH hash, but what would it be used for? User
space requested the FH, which can be reported by the requesting
program.

I'm not hearing a convincing specific justification for maintaining
observability here.... but we have a few more weeks before making a
more permanent decision.

-- 
Chuck Lever