On Wed, Mar 27, 2024 at 09:59:01AM +0200, Jani Nikula wrote: > On Wed, 27 Mar 2024, Maxime Ripard <mripard@xxxxxxxxxx> wrote: > > On Tue, Mar 26, 2024 at 03:56:50PM -0700, Nathan Chancellor wrote: > >> On Tue, Mar 05, 2024 at 11:07:36AM +0200, Jani Nikula wrote: > >> > Add kconfig to enable -Werror subsystem wide. This is useful for > >> > development and CI to keep the subsystem warning free, while avoiding > >> > issues outside of the subsystem that kernel wide CONFIG_WERROR=y might > >> > hit. > >> > > >> > v2: Don't depend on COMPILE_TEST > >> > > >> > Reviewed-by: Hamza Mahfooz <hamza.mahfooz@xxxxxxx> # v1 > >> > Signed-off-by: Jani Nikula <jani.nikula@xxxxxxxxx> > >> > --- > >> > drivers/gpu/drm/Kconfig | 13 +++++++++++++ > >> > drivers/gpu/drm/Makefile | 3 +++ > >> > 2 files changed, 16 insertions(+) > >> > > >> > diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig > >> > index 6e853acf15da..c08e18108c2a 100644 > >> > --- a/drivers/gpu/drm/Kconfig > >> > +++ b/drivers/gpu/drm/Kconfig > >> > @@ -416,3 +416,16 @@ config DRM_LIB_RANDOM > >> > config DRM_PRIVACY_SCREEN > >> > bool > >> > default n > >> > + > >> > +config DRM_WERROR > >> > + bool "Compile the drm subsystem with warnings as errors" > >> > + depends on EXPERT > >> > + default n > >> > + help > >> > + A kernel build should not cause any compiler warnings, and this > >> > + enables the '-Werror' flag to enforce that rule in the drm subsystem. > >> > + > >> > + The drm subsystem enables more warnings than the kernel default, so > >> > + this config option is disabled by default. > >> > + > >> > + If in doubt, say N. > >> > >> While I understand the desire for an easy switch that maintainers and > >> developers can use to ensure that their changes are warning free for the > >> drm subsystem specifically, I think subsystem specific configuration > >> options like this are actively detrimental to developers and continuous > >> integration systems that build test the entire kernel. For example, we > >> turned off CONFIG_WERROR for our Hexagon builds because of warnings that > >> appear with -Wextra that are legitimate but require treewide changes to > >> resolve in a manner sufficient for Linus: > >> > >> https://github.com/ClangBuiltLinux/linux/issues/1285 > >> https://lore.kernel.org/all/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@xxxxxxxxxxxxxx/ > >> https://lore.kernel.org/all/20230522105049.1467313-1-schnelle@xxxxxxxxxxxxx/ > >> > >> But now, due to CONFIG_DRM_WERROR getting enabled by all{mod,yes}config > >> and -Wextra being unconditionally enabled for DRM, those warnings hard > >> break the build despite CONFIG_WERROR=n... > > > > Would making DRM_WERROR depends on WERROR address your concerns? > > But then what would be the point of having DRM_WERROR at all? For me the > point is, "werror in drm, ignore the rest, they're someone else's > problem". Right, I do think this is a valid view point and one I am sympathetic to, especially since it is in the pursuit of increased code quality. I do not want to disrupt that. > An alternative would be to "depends on !COMPILE_TEST" that we have in > i915, but then some folks want to have COMPILE_TEST in drm, because some > drivers are otherwise hard for people to build. Right. I think it is unfortunate how (at least in my opinion) CONFIG_COMPILE_TEST has two meanings: genuinely just compile testing or "allmodconfig". For the first case, we would want CONFIG_DRM_WERROR=y but for the second case, it would be nice for CONFIG_DRM_WERROR to default to off (because CONFIG_WERROR is enabled) but allow developers to turn it on explicitly. Another lofty/wistful idea to solve this would be to implement something similar to compiler diagnostic groups for Kconfig, where there would be a hierarchy like CONFIG_WERROR CONFIG_DRM_WERROR CONFIG_SUBSYSTEM_A_WERROR CONFIG_SUBSYSTEM_B_WERROR where the value of CONFIG_WERROR is the same value for all subconfigurations under it but they could still be enabled individually without any additional dependencies (ala something like '-Wno-unused -Wunused-variable'), which would allow my use case of CONFIG_WERROR=n removing all instances of -Werror to continue to work but allow other developers and CI systems to just set their specific -Werror configuration and be done with it. I don't think something like that exists but maybe I don't know Kconfig as well as I think I do :) > Nathan, we do want to fix any issues switfly. Are you hitting specific > build problems? Yes, I see three distinct set of problems from our CI as a direct result of this series. I already covered two in the prior mail but I'll be a little more expansive below. 1. Instances of -Wunused-but-set-variable from variables that only have unary operations applied to them. Clang can warn in this case where GCC cannot: https://godbolt.org/z/d368q3coP int main(void) { int a = 0; a++; return 0; } which shows up in a few drm drivers. Most have a patch on the mailing list that has not been applied. drivers/gpu/drm/qxl/qxl_cmd.c:424:6: error: variable 'count' set but not used [-Werror,-Wunused-but-set-variable] 424 | int count = 0; | ^ https://lore.kernel.org/all/20230408165023.2706235-1-trix@xxxxxxxxxx/ (almost a year old) drivers/gpu/drm/qxl/qxl_ioctl.c:148:14: error: variable 'num_relocs' set but not used [-Werror,-Wunused-but-set-variable] 148 | int i, ret, num_relocs; | ^ https://lore.kernel.org/all/20240307104119.1980621-1-colin.i.king@xxxxxxxxx/ drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c:843:6: error: variable 'out' set but not used [-Werror,-Wunused-but-set-variable] 843 | u64 out = dumper->iova + A6XX_CD_DATA_OFFSET; | ^ https://lore.kernel.org/all/20240326212324.185832-1-ojeda@xxxxxxxxxx/ (recent patch) drivers/gpu/drm/panthor/panthor_sched.c:2048:6: error: variable 'csg_mod_mask' set but not used [-Werror,-Wunused-but-set-variable] 2048 | u32 csg_mod_mask = 0, free_csg_slots = 0; | ^ No patch, new driver, not reported yet it seems. 2. High stack usage in AMDGPU files for ARCH=powerpc allmodconfig. This might be a compiler issue but until now, there have been more important fires. drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:1096:12: error: stack frame size (2064) exceeds limit (2048) in 'vcn_v3_0_start' [-Werror,-Wframe-larger-than] 1096 | static int vcn_v3_0_start(struct amdgpu_device *adev) | ^ drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:955:12: error: stack frame size (2112) exceeds limit (2048) in 'vcn_v4_0_5_start' [-Werror,-Wframe-larger-than] 955 | static int vcn_v4_0_5_start(struct amdgpu_device *adev) | ^ drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:713:12: error: stack frame size (2992) exceeds limit (2048) in 'vcn_v5_0_0_start' [-Werror,-Wframe-larger-than] 713 | static int vcn_v5_0_0_start(struct amdgpu_device *adev) | ^ Taking a brief look at it while writing this email, it appears related to CONFIG_UBSAN_BOUNDS, as none of the warnings appear when that is disabled on top of allmodconfig. I suspect that the sanitizer instrumentation and inlining might be messing something up here, it has happened with other sanitizers like KASAN and KCSAN in the past. Without CONFIG_UBSAN_BOUNDS, the stack usage of these functions does not seem too bad: drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:1096:12: warning: stack frame size (816) exceeds limit (512) in 'vcn_v3_0_start' [-Wframe-larger-than] drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:955:12: warning: stack frame size (800) exceeds limit (512) in 'vcn_v4_0_5_start' [-Wframe-larger-than] drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:713:12: warning: stack frame size (1040) exceeds limit (512) in 'vcn_v5_0_0_start' [-Wframe-larger-than] 3. -Wnull-pointer-arithmetic from IO port accessors on architectures that do not have them (such as hexagon and s390). For example: In file included from drivers/gpu/drm/virtio/virtgpu_plane.c:26: In file included from include/drm/drm_atomic_helper.h:31: In file included from include/drm/drm_crtc.h:32: In file included from include/drm/drm_modes.h:33: In file included from include/drm/drm_connector.h:32: In file included from include/drm/drm_util.h:35: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:11: In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/hexagon/include/asm/io.h:328: include/asm-generic/io.h:584:33: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic] 584 | __raw_writeb(value, PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:594:59: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic] 594 | __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:604:59: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic] 604 | __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr); | ~~~~~~~~~~ ^ These warnings are numerous throughout drivers/gpu/drm/ because this warning is enabled with -Wextra. Again, this is not exactly your problem and it should eventually be fixed by [1] (it appears that Niklas is working on a new version at [2]) but it is exacerbated by the default combo of W=1 + -Werror for DRM with allmodconfig now, even with CONFIG_WERROR=n. Hopefully that helps clear things up. I am more than happy to send patches or work towards solutions that satisfies everyone (or at least a majority/consensus). Wider testing with clang never hurts as well but I understand increasing build matrices is not always an easy sell. [1]: https://lore.kernel.org/all/20230522105049.1467313-45-schnelle@xxxxxxxxxxxxx/ [2]: https://git.kernel.org/niks/l/has_ioport_v6 Cheers, Nathan