Re: [PATCH v3 0/7] perf: Add ioctl for PMU driver configuration

Kim Phillips <kim.phillips@xxxxxxx> · Tue, 21 Aug 2018 14:17:25 -0500

On Tue, 21 Aug 2018 11:16:49 -0600
Mathieu Poirier <mathieu.poirier@xxxxxxxxxx> wrote:

> On Mon, 20 Aug 2018 at 08:36, Suzuki K Poulose <suzuki.poulose@xxxxxxx> wrote:
> >
> > On 08/20/2018 03:22 PM, Kim Phillips wrote:
> > > On Mon, 20 Aug 2018 11:03:03 +0100
> > > Suzuki K Poulose <suzuki.poulose@xxxxxxx> wrote:
> > >
> > >> On 08/16/2018 08:28 PM, Mathieu Poirier wrote:
> > >>> On Wed, 15 Aug 2018 at 09:28, Kim Phillips <kim.phillips@xxxxxxx> wrote:
> > >>>>
> > >>>> On Wed, 15 Aug 2018 10:39:13 +0100
> > >>>> Will Deacon <will.deacon@xxxxxxx> wrote:
> > >>>>
> > >>>>> On Tue, Aug 14, 2018 at 01:42:27PM -0600, Mathieu Poirier wrote:
> > >>>>>> On Tue, 14 Aug 2018 at 11:09, Kim Phillips <kim.phillips@xxxxxxx> wrote:
> > >>>>>>> The other thing that's going on here is that I'm becoming numb to the
> > >>>>>>> loathsome "failed to mmap with 12 (Cannot allocate memory)" being
> > >>>>>>> returned no matter what the error is/was. E.g., an error that would
> > >>>>>>> indicate a sense of non-implementation would be much better
> > >>>>>>> appreciated than presumably what the above is doing, i.e., returning
> > >>>>>>> -ENOMEM.  That, backed up with specific details in the form of human
> > >>>>>>> readable text in dmesg would be *most* welcome.
> > >>>>>>
> > >>>>>> As part of the refactoring of the code to support CPU-wide scenarios I
> > >>>>>> intend to emit better diagnostic messages from the driver.  Modifying
> > >>>>>> rb_alloc_aux() to propagate the error message generated by the
> > >>>>>> architecture specific PMUs doesn't look hard either and I _may_ get to
> > >>>>>> it as part of this work.
> > >>>>>
> > >>>>> For the record, I will continue to oppose PMU drivers that dump diagnostics
> > >>>>> about user-controlled input into dmesg, but the coresight drivers are yours
> > >>>>> so it's up to you and I won't get in the way!
> > >>>>
> > >>>> That sounds technically self-contradicting to me.  Why shouldn't
> > >>>> coresight share the same policies as those used for PMU drivers?  Or
> > >>>> why not allow the individual vendor PMU driver authors control the
> > >>>> level of user-friendliness of their own drivers?
> > >>>>
> > >>>> That being said, Matheiu, would you accept patches that make coresight
> > >>>> more verbose in dmesg?
> > >>>
> > >>> It depends on the issue you're hoping to address.  I'd rather see the
> > >>> root cause of the problem fixed than adding temporary code.  Suzuki
> > >>> added the ETR perf API and I'm currently working on CPU-wide
> > >>> scenarios.  From there and with regards to what can happen in
> > >>> setup_aux(), we should have things covered.
> > >>
> > >> I think the main issue is the lack of error code propagation from
> > >> setup_aux() back to the perf_aux_output_handle_begin(), which always
> > >> return -ENOMEM. If we fix that, we could get better idea of whats
> > >> wrong.
> > >
> > > Why get a better idea when we can get the exact details?
> >
> > The different values for error numbers are there for a reason...
> >
> > >
> > >> If someone is planning to add verbose messages, they may do so by adding
> > >> dev_dbg() / pr_debug(), which can be turned on as and when needed.
> > >
> > > I disagree:  that just adds another usage and kernel configuration
> > > obstacle.  Why not use pr_err straight up?
> 
> I think everything on this topic has been said already.  As I remarked

Apparently not :)

> earlier once I'm done with CPU-wide support we shouldn't need detailed
> error reporting.  Everything should be handled via error code
> propagation and that is what I'd like to see addressed in the first
> place.  From there we can think about individual error cases as they
> come up.

FYI, there are a lot of -EINVAL cases:

$ git grep -- -EINVAL drivers/hwtracing/coresight/ | wc -l
142

> > I personally don't agree to usage of pr_err() in paths which are easily
> > triggered from user input. Also, we are moving all the "debugging"
> > messages to the dynamic debug, to prevent lockdep splats.
> 
> A slight correction here - we are moving most of the framework error
> reporting to dynamic debug because they clog the log file and aren't
> useful outside of a development context.  It is not a remedy for the

I didn't see anyone complain about any 'clogging', and I sure don't
mind seeing the extra messaging, esp. when using coresight in sysfs
mode: it's reinforcement to the user that it's doing work.

> negative interaction between event locking and console access
> generated by the framework's reporting of device status as a path is
> built/enabled/disabled.

I'd like to see this splat from this 'negative interaction', and it be
fixed before we allow code in that makes it silently go away...

Kim