On 02/02/2023 17:11, Teres Alexis, Alan Previn wrote:
On Thu, 2023-02-02 at 08:43 +0000, Tvrtko Ursulin wrote:
On 02/02/2023 08:13, Alan Previn wrote:
MESA driver is creating protected context on every driver handle
initialization to query caps bit for app. So when running CI tests,
they are observing hundreds of drm_errors when enabling PXP
in .config but using SOC or BIOS configuration that cannot support
PXP sessions.
Update error handling codes to be more selective on which errors
are reported as drm_error vs drm_WARN_ONCE vs drm_debug.
Don't completely remove all FW error replies (at least keep them
but use drm_debug) or else cusomers that really needs to know that
content protection failed won't be aware of it when debugging.
Signed-off-by: Alan Previn <alan.previn.teres.alexis@xxxxxxxxx>
How does this relate to b762787bf767 ("drm/i915/pxp: Use drm_dbg if arb
session failed due to fw version") which I thought was already fixing
the drm_error spam caused by userspace probing?
Good question. That previous error was specific to a board that was using
outdated firmware version that really needed to be upgraded.
At that point i wasn't aware of the the fact that MESA was seeing
high frequency of this failure that is tied to platform issues
(BIOS configuration / SOC fusing). Also, i believe in the prior case
PXP was not enabled by default the .config in all testing.
In this latest reported bug (i realized i forgot to include the bug no. for this
new patch - https://gitlab.freedesktop.org/drm/intel/-/issues/7706#note_1746952),
i was informed that PXP is being enabled by default and there
were DUT hardware that was not PXP-capable (SOC fusing / BIOS config).
So with this patch, i am trying to balance between issues that is critical
but are root-caused from HW/platform gaps (louder drm-warn - but just ONCE)
vs other cases where it could also come from hw/sw state machine (which cannot
be a WARB_ONCE message since it can occur due to runtime operation events).
One thing to note: i am pushing-for / waiting-on our firmware team to get
blessing on more fw-error-code to error-string translations that can be allowed
upstream which is why i added the "pxp_fw_err_to_string" and a single
"drm_dbg" so that in future, we don't have to keep adding a whole new lines of
code to multiple functions but just one new error code translation - and instead
just add the new err-code-to-string entry into a single location.
note: i will re-rev with the bug id.
Thanks for the details. Yes definitely avoid any drm_warn/err/WARN on
invalid conditions/usage that can be triggered from userspace.
And given the bug report is about TGL probably try to add a Fixes: tag
with an appropriate target too, so that there is less bug re-reports
from the released kernels.
Regards,
Tvrtko