Hi AMD folks, Top posting because it might not have been obvious but I was looking for your feedback on this message (which can be viewed on lore.kernel.org if you do not have the original [1]) so that we can try to get this fixed in some way for 6.0/6.1. If my approach is not welcome, please consider suggesting another one or looking to see if this is something you all could look into. [1]: https://lore.kernel.org/Yv5h0rb3AgTZLVJv@dev-arch.thelio-3990X/ Cheers, Nathan On Thu, Aug 18, 2022 at 08:59:14AM -0700, Nathan Chancellor wrote: > Hi Arnd, > > Doubling back around to this now since I think this is the only thing > breaking x86_64 allmodconfig with clang 11 through 15. > > On Fri, Aug 05, 2022 at 09:32:13PM +0200, Arnd Bergmann wrote: > > On Fri, Aug 5, 2022 at 8:02 PM Nathan Chancellor <nathan@xxxxxxxxxx> wrote: > > > On Fri, Aug 05, 2022 at 06:16:45PM +0200, Arnd Bergmann wrote: > > > > On Fri, Aug 5, 2022 at 5:32 PM Harry Wentland <harry.wentland@xxxxxxx> wrote: > > > > While splitting out sub-functions can help reduce the maximum stack > > > > usage, it seems that in this case it makes the actual problem worse: > > > > I see 2168 bytes for the combined > > > > dml32_ModeSupportAndSystemConfigurationFull(), but marking > > > > mode_support_configuration() as noinline gives me 1992 bytes > > > > for the outer function plus 384 bytes for the inner one. So it does > > > > avoid the warning (barely), but not the problem that the warning tries > > > > to point out. > > > > > > I haven't had a chance to take a look at splitting things up yet, would > > > you recommend a different approach? > > > > Splitting up large functions can help when you have large local variables > > that are used in different parts of the function, and the split gets the > > compiler to reuse stack locations. > > > > I think in this particular function, the problem isn't actually local variables > > but either pushing variables on the stack for argument passing, > > or something that causes the compiler to run out of registers so it > > has to spill registers to the stack. > > > > In either case, one has to actually look at the generated output > > and then try to rearrange the codes so this does not happen. > > > > One thing to try would be to condense a function call like > > > > dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( > > > > &v->dummy_vars.dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport, > > mode_lib->vba.USRRetrainingRequiredFinal, > > mode_lib->vba.UsesMALLForPStateChange, > > > > mode_lib->vba.PrefetchModePerState[mode_lib->vba.VoltageLevel][mode_lib->vba.maxMpcComb], > > mode_lib->vba.NumberOfActiveSurfaces, > > mode_lib->vba.MaxLineBufferLines, > > mode_lib->vba.LineBufferSizeFinal, > > mode_lib->vba.WritebackInterfaceBufferSize, > > mode_lib->vba.DCFCLK, > > mode_lib->vba.ReturnBW, > > mode_lib->vba.SynchronizeTimingsFinal, > > > > mode_lib->vba.SynchronizeDRRDisplaysForUCLKPStateChangeFinal, > > mode_lib->vba.DRRDisplay, > > v->dpte_group_bytes, > > v->meta_row_height, > > v->meta_row_height_chroma, > > > > v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.mmSOCParameters, > > mode_lib->vba.WritebackChunkSize, > > mode_lib->vba.SOCCLK, > > v->DCFCLKDeepSleep, > > mode_lib->vba.DETBufferSizeY, > > mode_lib->vba.DETBufferSizeC, > > mode_lib->vba.SwathHeightY, > > mode_lib->vba.SwathHeightC, > > mode_lib->vba.LBBitPerPixel, > > v->SwathWidthY, > > v->SwathWidthC, > > mode_lib->vba.HRatio, > > mode_lib->vba.HRatioChroma, > > mode_lib->vba.vtaps, > > mode_lib->vba.VTAPsChroma, > > mode_lib->vba.VRatio, > > mode_lib->vba.VRatioChroma, > > mode_lib->vba.HTotal, > > mode_lib->vba.VTotal, > > mode_lib->vba.VActive, > > mode_lib->vba.PixelClock, > > mode_lib->vba.BlendingAndTiming, > > .... /* more arguments */); > > > > into calling conventions that take a pointer to 'mode_lib->vba' and another > > one to 'v', so these are no longer passed on the stack individually. > > So I took a whack at reducing this function's number of parameters and > ended up with the attached patch. I basically just removed any > parameters that were identical between the two call sites and access them > through the vba pointer, as you suggested. > > AMD folks, is this an acceptable approach? It didn't take a trivial > amount of time so I want to make sure this is okay before I do it to > more functions/files. > > Due to the potential size of these changes, I am a little weary of them > going into 6.0; even though they should be a simple search and replace > for the most part, it might be nice for them to have some decent soak > time in -next. One solution would be to raise the warning limit for > these files on 6.0 so that allmodconfig does not ship broken then reduce > the limit for 6.1 once these patches have been applied. > > Additionally, I took a look at the stack usage across all compilers that > the kernel supports and I thought it was kind of interesting that the > usage really jumps from GCC 7 to 8, which I am guessing is a result of > commit e8a170ff9a35 ("drm/amdgpu: enable -msse2 for GCC 7.1+ users"). > GCC 8 allmodconfig actually errors now too: > > https://lore.kernel.org/alpine.DEB.2.22.394.2208152006320.289321@xxxxxxxxxxxxxx/ > > |-----------------------------------------------|-----------------------------------------------|-----------------------------------------------| > | dml30_ModeSupportAndSystemConfigurationFull() | dml31_ModeSupportAndSystemConfigurationFull() | dml32_ModeSupportAndSystemConfigurationFull() | > |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------| > | GCC 5 | 1056 bytes | 656 bytes | 1040 bytes | > | GCC 6 | 1024 bytes | 672 bytes | 1056 bytes | > | GCC 7 | 1040 bytes | 664 bytes | 1056 bytes | > | GCC 8 | 1760 bytes | 1608 bytes | 2144 bytes | > | GCC 9 | 1664 bytes | 1392 bytes | 1960 bytes | > | GCC 10 | 1648 bytes | 1368 bytes | 1952 bytes | > | GCC 11 | 1680 bytes | 1400 bytes | 1952 bytes | > | GCC 12 | 1680 bytes | 1400 bytes | 1984 bytes | > |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------| > | LLVM 11 | 2104 bytes | 2056 bytes | 2120 bytes | > | LLVM 12 | 2152 bytes | 2200 bytes | 2152 bytes | > | LLVM 13 | 2216 bytes | 2248 bytes | 2168 bytes | > | LLVM 14 | 2168 bytes | 2184 bytes | 2160 bytes | > | LLVM 15 | 2216 bytes | 2184 bytes | 2160 bytes | > | LLVM 16 | 2232 bytes | 2216 bytes | 2176 bytes | > |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------| > > With the patch I have attached, > dml32_ModeSupportAndSystemConfigurationFull() drops from 2176 to 1944 > for LLVM 16, which is obviously still not great but it at least avoids > the warning. > > Cheers, > Nathan