On 9/17/2024 3:24 PM, Jeffrey Hugo wrote:
On 9/16/2024 1:40 PM, Brian Masney wrote:
On Mon, Sep 16, 2024 at 1:42 PM Jeffrey Hugo <quic_jhugo@xxxxxxxxxxx>
wrote:
Bisect pointed to the following which makes zero sense -
[snip]
I wonder if bisect-ability got broken somehow.
I'm going to try to do a bit of a manual bisect to see if I can avoid
whatever glitch (possibly self induced) I seem to have hit.
I've seen this happen if the error is due to a race condition and only
happens part of the time. When you are testing a kernel, try booting
the system up to 3 times before you run 'git bisect good' against a
particular iteration.
Found some issues with my initial bisect effort.
New run points to:
commit 1b0e3ea9301a422003d385cda8f8dee6c878ad05
Author: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
Date: Mon Aug 14 21:16:42 2023 +0800
perf/smmuv3: Add MODULE_ALIAS for module auto loading
On my ACPI based arm64 server, if the SMMUv3 PMU is configured as
module it won't be loaded automatically after booting even if the
device has already been scanned and added. It's because the module
lacks a platform alias, the uevent mechanism and userspace tools
like udevd make use of this to find the target driver module of the
device. This patch adds the missing platform alias of the module,
then module will be loaded automatically if device exists.
Before this patch:
[root@localhost tmp]# modinfo arm_smmuv3_pmu | grep alias
alias: of:N*T*Carm,smmu-v3-pmcgC*
alias: of:N*T*Carm,smmu-v3-pmcg
After this patch:
[root@localhost tmp]# modinfo arm_smmuv3_pmu | grep alias
alias: platform:arm-smmu-v3-pmcg
alias: of:N*T*Carm,smmu-v3-pmcgC*
alias: of:N*T*Carm,smmu-v3-pmcg
Signed-off-by: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
Link:
https://lore.kernel.org/r/20230814131642.65263-1-yangyicong@xxxxxxxxxx
Signed-off-by: Will Deacon <will@xxxxxxxxxx>
drivers/perf/arm_smmuv3_pmu.c | 1 +
1 file changed, 1 insertion(+)
This one seems to make a bit more sense, and reverting it does make the
prints go away. Feels like either the driver is getting triggered
earlier, or wasn't getting triggered before at all.
I plan to come back to this later in the week to dig more.
Or apparently 3 weeks later since life has a funny way of having other
plans.
Prior to the above change, the arm_smmuv3_pmu module can be manually
loaded via modprobe, and the same errors will appear. This looks like
an existing issue, that was just made visible, rather than something
"newly" introduced.
arm_smmuv3_pmu is failing to obtain the second resource. It is
consuming a device that is created by the IORT table parser -
drivers/acpi/arm64/iort.c
arm_smmu_v3_pmcg_init_resources() has a relevant comment for this issue-
/*
* The initial version in DEN0049C lacked a way to describe register
* page 1, which makes it broken for most PMCG implementations; in
* that case, just let the driver fail gracefully if it expects to
* find a second memory resource.
*/
Checking the IORT implementation, we do advertise revision 0. I'm not
certain, but I'm guessing this spec update occurred after the last
firmware release of QDF2400. I believe a FW update is unlikely so I
suspect the options are -
1. Ignore the errors
2. Disable the driver on this platform
3. Use the ACPI initramfs override feature to silence the errors at the
IORT table
Probably not the resolution we'd like, but this does feel like a final
conclusion.
-Jeff