On Thu, Dec 8, 2022 at 4:03 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Thu, Dec 8, 2022 at 10:01 AM Yonghong Song <yhs@xxxxxxxx> wrote: > > > > > > > > On 12/7/22 6:19 PM, Connor O'Brien wrote: > > > On Wed, Nov 9, 2022 at 8:45 AM Yonghong Song <yhs@xxxxxxxx> wrote: > > >> > > >> > > >> > > >> On 11/8/22 6:41 PM, Connor O'Brien wrote: > > >>> Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF > > >>> mismatches are expected and module loading should proceed > > >>> anyway. Logging with pr_warn() on every one of these "benign" > > >>> mismatches creates unnecessary noise when many such modules are > > >>> loaded. Instead, limit logging to the case where a BTF mismatch > > >>> actually prevents a module form loading. > > >>> > > >>> Signed-off-by: Connor O'Brien <connoro@xxxxxxxxxx> > > >>> --- > > >>> kernel/bpf/btf.c | 7 ++++--- > > >>> 1 file changed, 4 insertions(+), 3 deletions(-) > > >>> > > >>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c > > >>> index 5579ff3a5b54..406370487413 100644 > > >>> --- a/kernel/bpf/btf.c > > >>> +++ b/kernel/bpf/btf.c > > >>> @@ -7190,11 +7190,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op, > > >>> } > > >>> btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size); > > >>> if (IS_ERR(btf)) { > > >>> - pr_warn("failed to validate module [%s] BTF: %ld\n", > > >>> - mod->name, PTR_ERR(btf)); > > >> > > >> I think such warning still useful even with > > >> CONFIG_MODULE_ALLOW_BTF_MISMATCH. > > >> Can we use pr_warn_ratelimited instead of pr_warn in the above to > > >> avoid excessive warnings? > > > > > > I gave this a try on a Pixel 6 but I'm not sure it quite addresses the > > > issue. The amount of logging doesn't seem to decrease much, I think > > > because the interval between loading one mismatched module and the > > > next can be greater than the default rate limit. To my mind, the issue > > > is the total volume of these messages more so than their rate. > > > > > > For context, Android devices using the GKI may load hundreds of > > > separately-built modules, and BTF mismatches are possible for any/all > > > of these. It was pointed out to me that btf_verifier_log_type can also > > > print several more lines per mismatched module. ~5 lines of logging > > > for each mismatched module can start to add up, in terms of both > > > overhead and the noise added to the kernel logs. > > > > > > This is more subjective but I think the warnings also read as though > > > this is a more serious failure that might prevent affected modules > > > from working correctly; anecdotally, I've gotten multiple questions > > > about them asking if something is broken. This can be a red herring > > > for anyone unfamiliar with BTF who is reading the logs to debug > > > unrelated issues. In the CONFIG_MODULE_ALLOW_BTF_MISMATCH=y case the > > > flood of warnings seems out of proportion to the actual result > > > (modules still load successfully, just without debug info) especially > > > since the user has explicitly enabled a config saying they expect > > > mismatches. > > > > > > If there needs to be some logging in the "mismatch allowed" case, > > > could an acceptable middle ground be to use pr_warn_once to send a > > > > So it looks like pr_warn_ratelimited still produces a lot of warning. > > In this case, I guess pr_warn_once should be okay. > > Maybe pr_warn_once generic "some kernel module BTF mismatched". And > also warn per-module message with details if > CONFIG_MODULE_ALLOW_BTF_MISMATCH is not set? > Yeah, I figured the logging behavior should remain unchanged when CONFIG_MODULE_ALLOW_BTF_MISMATCH is unset, the existing verbose logging makes sense when every mismatch stops a module from loading. Then for the pr_warn_once case something like "Kernel module BTF mismatch detected, BTF debug info may be unavailable for some modules" > > > > > single message reporting that mismatches were detected & module BTF > > > debug info might be unavailable? Alternatively, if we could opt out of > > > module BTF loading then that would also avoid this issue, but that's > > > already been proposed before ([1], [2]) so I thought working with the > > > existing config option might be preferred. > > > > > > [1] https://lore.kernel.org/bpf/20220209052141.140063-1-connoro@xxxxxxxxxx/ > > > [2] https://lore.kernel.org/bpf/20221004222725.2813510-1-sdf@xxxxxxxxxx/