Hi Masahiro, On Fri, 2023-05-12 at 15:09 +0800, Philip Li wrote: > On Fri, May 12, 2023 at 12:25:13PM +0900, Masahiro Yamada wrote: > > Hello, maintainers of the kbuild test robot. > > > > I have a proposal for the 0day tests. > > Thanks a lot for the proposal for the shuffle make, we will do some > investigation to try this random order parallel build. The gnu make > we currently use is 4.3, we will try the 4.4 to see any problem. > > For the timeline, we may provide update later this month. We've upgraded to make v4.4.1 in kernel test robot and enabled random- order parallel compiling in our randconfig build tests. The shuffle seed is generated by hashing the randconfig, so it changes overtime and can cover various random orders. We are still doing some internal testing and will put it online once everything is done. > > > > > > GNU Make traditionally processes the dependency from left to right. > > > > For example, if you have dependency like this: > > > > all: foo bar baz > > > > GNU Make builds foo, bar, baz, in this order. > > > > > > Some projects that are not capable of parallel builds > > rely on that behavior implicitly. > > > > Kbuild, however, is intended to work well in parallel. > > (As the maintainer, I really care about it.) > > > > > > From time to time, people add "just worked for me" code, > > but apparently that lacks proper dependency. > > Sometimes it requires an expensive CPU to reproduce > > parallel build issues. > > > > > > For example, see this report, > > https://lkml.org/lkml/2016/11/30/587 > > > > The report says 'make -j112' reproduces the broken parallel build. > > Most people do not have such a build machine that comes with 112 > > cores. > > It is difficult to reproduce it (or even notice it). > > > > (Some time later, it was root-caused by 07a422bb213a) Thanks a lot for sharing this case. We tried to reproduce it, but looks it dates back to v4.9-rc7 and throws some other errors when compiling in our kbuild env, so we are not able to reproduce it yet. Not sure if it is related with toolchain/compiler version or the kernel config. This case mentioned that 'make -j112' can reproduce the breakage. We assume this is under traditional serial order build. Does it imply that it is likely to take much less parallel jobs to reproduce the breakage when shuffle is set, say 'make --shuffle=SEED -j32', so developers are able to reproduce it on an ordinary CPU with less cores? Not sure if there are other known cases of parallel build breakage (especially in recent kernels). If any, it would be very kind if you could also share them. We can first try reproducing them in the bot to confirm our test flow works well. Another question is about bisection. Say the bot catches a breakage on commit1 which root-caused to a previous commit2. If we keep the options "--shuffle=<seed> -j<jobs>" consistent during the whole process of bisection, will the breakage 100% show up on all the commits between commit2 and commit1, or it is kind of possible to reproduce the breakage, but not 100% reproducible on every commit during bisection? Thanks a lot for this parallel building proposal, and we will keep updating the status. -- Best Regards, Yujie Liu > > > > > > GNU Make 4.4 got this option. > > > > --shuffle[={SEED|random|reverse|none}] > > Perform shuffle of prerequisites and goals. > > > > > > > > 'make --shuffle=reverse' will build in reverse order. > > In the example above, baz, bar, foo. > > > > 'make --shuffle' will randomize the build order. > > > > > > If there exists a missing dependency among foo, bar, baz, > > it will fail to build. > > > > > > > > We already perform the randconfig daily basis. > > So, random-order parallel building is a similar idea. > > > > Perhaps, it makes sense to add the "--shuffle=SEED" option > > but it requires GNU Make 4.4. (or GNU Make 4.4.1) > > Is this too new? > > Our production environment is 4.3 right now. It will take extra > time for us to upgrade the environment but it's doable for us. > > > > > > > > > -- > > Best Regards > > Masahiro Yamada >