Re: [RFC][PATCH 0/5] NFS: trace points added to mounting path

Greg Banks <gnb@xxxxxxxxxxxxxxxxx> · Thu, 22 Jan 2009 13:04:33 +1100

Trond Myklebust wrote:
> On Thu, 2009-01-22 at 10:11 +1100, Greg Banks wrote:
>   
>> Trond Myklebust wrote:
>>     
>>> On Thu, 2009-01-22 at 09:36 +1100, Greg Banks wrote:
>>>   
>>>       
>>>> Chuck Lever wrote:
>>>>     
>>>>         
>>>>>       
>>>>>           
>>>> It depends on whether distros can be convinced to enable it by default,
>>>> and install by default any necessary userspace infrastructure.   The
>>>> most important thing for field debugging is Just Knowing that you have
>>>> all the bits necessary to perform useful debugging without having to
>>>> find some RPM that matches the kernel that the machine is actually
>>>> running now, and not the one that was present when the machine was
>>>> installed.
>>>>     
>>>>         
>>> Which is precisely why dprintk() is such a bad choice as a basis for a
>>> set of trace points: every new patch and bugfix that the distro applies
>>> will result in a reshuffling of the trace points as code is cleaned up
>>> and moved around or removed entirely.
>>>   
>>>       
>> Yes, if the filename and line number were the only information going
>> out.  The dprintk() format is usually enough (ignoring the patchy
>> quality of the current dprintk set)  to give a developer enough clue
>> about which dprintk is which.  Or am I missing something?
>>     
>
> The current dprintk() set was never designed to be anything other than a
> logging tool with a very coarse filter (the bitmask
> in /proc/sys/sunrpc/*_debug). It was designed to be human-readable only
> (no fixed format).
>
> As I understand it, you are not only proposing to make that filter
> extremely fine (individually addressable trace points), but also to
> enable the application of scripting tools like systemtap and LTTng in
> order to provide bespoke debugging of your customer problems. Have I
> misunderstood you, or is that correct?
>   

These are two separate proposals between which we're trying to find some
commonality.

In my proposal, the dprintk()s remain designed primarily for humans
(support staff or kernel developers) to read in conjunction with the
correct source code, but control is made fine-grain to make the
mechanism more controllable.  This can be done regardless of whether
trace points are involved and regardless of whether we attempt to
support scripts.

Changing dprintk() to add a trace point is just a way to get some trace
points with strictly minimum changes to callsites.

Replacing dprintk()s with new trace points has more or less the same
result but means more futzing with callsites.

> The question then is how is this going to work out in an environment
> where the individually addressable trace points/dprintk()s pop in and
> out of existence at the whim of a patch, and where the output format is
> similarly volatile?
> IOW: I'm referring to the difference between an interface that was
> designed purely to be interpreted by humans, and one that is designed
> from scratch to be interpreted by scripts.
>
>   

The maintenance problem of correlating any kind of instrumentation point
in kernel code with scripts living out in userspace exists regardless of
how you choose to implement the instrumentation.

-- 
Greg Banks, P.Engineer, SGI Australian Software Group.
the brightly coloured sporks of revolution.
I don't speak for SGI.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html