Re: [PATCH v3 0/7] perf: Add ioctl for PMU driver configuration

Suzuki K Poulose <suzuki.poulose@xxxxxxx> · Tue, 21 Aug 2018 09:39:22 +0100

On 08/20/2018 04:25 PM, Kim Phillips wrote:
On Mon, 20 Aug 2018 15:36:47 +0100
Suzuki K Poulose <suzuki.poulose@xxxxxxx> wrote:

On 08/20/2018 03:22 PM, Kim Phillips wrote:
On Mon, 20 Aug 2018 11:03:03 +0100
Suzuki K Poulose <suzuki.poulose@xxxxxxx> wrote:

On 08/16/2018 08:28 PM, Mathieu Poirier wrote:
On Wed, 15 Aug 2018 at 09:28, Kim Phillips <kim.phillips@xxxxxxx> wrote:

On Wed, 15 Aug 2018 10:39:13 +0100
Will Deacon <will.deacon@xxxxxxx> wrote:

On Tue, Aug 14, 2018 at 01:42:27PM -0600, Mathieu Poirier wrote:
On Tue, 14 Aug 2018 at 11:09, Kim Phillips <kim.phillips@xxxxxxx> wrote:
The other thing that's going on here is that I'm becoming numb to the
loathsome "failed to mmap with 12 (Cannot allocate memory)" being
returned no matter what the error is/was. E.g., an error that would
indicate a sense of non-implementation would be much better
appreciated than presumably what the above is doing, i.e., returning
-ENOMEM.  That, backed up with specific details in the form of human
readable text in dmesg would be *most* welcome.

As part of the refactoring of the code to support CPU-wide scenarios I
intend to emit better diagnostic messages from the driver.  Modifying
rb_alloc_aux() to propagate the error message generated by the
architecture specific PMUs doesn't look hard either and I _may_ get to
it as part of this work.

For the record, I will continue to oppose PMU drivers that dump diagnostics
about user-controlled input into dmesg, but the coresight drivers are yours
so it's up to you and I won't get in the way!

That sounds technically self-contradicting to me.  Why shouldn't
coresight share the same policies as those used for PMU drivers?  Or
why not allow the individual vendor PMU driver authors control the
level of user-friendliness of their own drivers?

That being said, Matheiu, would you accept patches that make coresight
more verbose in dmesg?

It depends on the issue you're hoping to address.  I'd rather see the
root cause of the problem fixed than adding temporary code.  Suzuki
added the ETR perf API and I'm currently working on CPU-wide
scenarios.  From there and with regards to what can happen in
setup_aux(), we should have things covered.

I think the main issue is the lack of error code propagation from
setup_aux() back to the perf_aux_output_handle_begin(), which always
return -ENOMEM. If we fix that, we could get better idea of whats
wrong.

Why get a better idea when we can get the exact details?

The different values for error numbers are there for a reason...

But the same error number, e.g., EINVAL, can be returned for different
reasons.

True, but then you can narrow it down by tuning it. We do that for
several syscalls without printing any useful messages to debug. Why
should the perf syscalls be any different ?

If someone is planning to add verbose messages, they may do so by adding
dev_dbg() / pr_debug(), which can be turned on as and when needed.

I disagree:  that just adds another usage and kernel configuration
obstacle.  Why not use pr_err straight up?

I personally don't agree to usage of pr_err() in paths which are easily
triggered from user input.

Why not?  pr_* are ratelimited.

Also, we are moving all the "debugging"
messages to the dynamic debug, to prevent lockdep splats.

Are you referring to this?:

https://lkml.org/lkml/2018/5/1/7

Definitely not that thread.

Re-reading the thread, AFAICT, it was never proven that the splats
occurred due to the dev_infos, and there's no dev_info in this
stacktrace:

https://lkml.org/lkml/2018/5/10/269

That doesn't have the stack trace. Mathieu was also able to reproduce
the lockdep splat involving console lock lately. Unfortunately none of
these were captured here.

But even if it were, wouldn't the splats also occur with dev_dbg?

Not normally. dev_dbg() has to be turned on manually and the user knows
what he is doing. That allows the normal user to use the system without
any trouble.

Suzuki