On Fri, Jun 9, 2023 at 5:41 PM Liu, Yujie <yujie.liu@xxxxxxxxx> wrote: > > Hi Masahiro, > > On Fri, 2023-05-12 at 15:09 +0800, Philip Li wrote: > > On Fri, May 12, 2023 at 12:25:13PM +0900, Masahiro Yamada wrote: > > > Hello, maintainers of the kbuild test robot. > > > > > > I have a proposal for the 0day tests. > > > > Thanks a lot for the proposal for the shuffle make, we will do some > > investigation to try this random order parallel build. The gnu make > > we currently use is 4.3, we will try the 4.4 to see any problem. > > > > For the timeline, we may provide update later this month. > > We've upgraded to make v4.4.1 in kernel test robot and enabled random- > order parallel compiling in our randconfig build tests. The shuffle > seed is generated by hashing the randconfig, so it changes overtime and > can cover various random orders. We are still doing some internal > testing and will put it online once everything is done. > > > > > > > > > > GNU Make traditionally processes the dependency from left to right. > > > > > > For example, if you have dependency like this: > > > > > > all: foo bar baz > > > > > > GNU Make builds foo, bar, baz, in this order. > > > > > > > > > Some projects that are not capable of parallel builds > > > rely on that behavior implicitly. > > > > > > Kbuild, however, is intended to work well in parallel. > > > (As the maintainer, I really care about it.) > > > > > > > > > From time to time, people add "just worked for me" code, > > > but apparently that lacks proper dependency. > > > Sometimes it requires an expensive CPU to reproduce > > > parallel build issues. > > > > > > > > > For example, see this report, > > > https://lkml.org/lkml/2016/11/30/587 > > > > > > The report says 'make -j112' reproduces the broken parallel build. > > > Most people do not have such a build machine that comes with 112 > > > cores. > > > It is difficult to reproduce it (or even notice it). > > > > > > (Some time later, it was root-caused by 07a422bb213a) > > Thanks a lot for sharing this case. We tried to reproduce it, but looks > it dates back to v4.9-rc7 and throws some other errors when compiling > in our kbuild env, so we are not able to reproduce it yet. Not sure if > it is related with toolchain/compiler version or the kernel config. > > This case mentioned that 'make -j112' can reproduce the breakage. We > assume this is under traditional serial order build. Does it imply that > it is likely to take much less parallel jobs to reproduce the breakage > when shuffle is set, say 'make --shuffle=SEED -j32', so developers are > able to reproduce it on an ordinary CPU with less cores? I think --shuffle will help a build machine with fewer cores catch issues, but it is not a full randomization. In my understanding, --shuffle still traverses depth-first. Consider this example. all: foo bar foo: foo-sub bar: bar-sub Only either [1] or [2] happens. [1] foo-sub -> foo -> bar-sub -> bar -> all [2] bar-sub -> bar -> foo-sub -> foo -> all foo-sub -> bar-sub -> bar -> foo -> all is a possible order, but --shuffle never schedules like that. > Not sure if there are other known cases of parallel build breakage > (especially in recent kernels). If any, it would be very kind if you > could also share them. We can first try reproducing them in the bot to > confirm our test flow works well. I do not remember any other real breakage. > > Another question is about bisection. Say the bot catches a breakage on > commit1 which root-caused to a previous commit2. If we keep the options > "--shuffle=<seed> -j<jobs>" consistent during the whole process of > bisection, will the breakage 100% show up on all the commits between > commit2 and commit1, or it is kind of possible to reproduce the > breakage, but not 100% reproducible on every commit during bisection? I am not sure, but I _guess_ git-bisect may not point to commit 2 if there is a Makefile change in between. commit2 (root cause) -> commitA (add Makefile change) -> commit1 (0 day bot noticed an issue here) Even if the same --shuffle=SEED is given, the issue may not be reproducible on commit2..commitA if commitA changes a Makefile. Thanks for considering this. > Thanks a lot for this parallel building proposal, and we will keep > updating the status. > > -- > Best Regards, > Yujie Liu > > > > > > > > > > GNU Make 4.4 got this option. > > > > > > --shuffle[={SEED|random|reverse|none}] > > > Perform shuffle of prerequisites and goals. > > > > > > > > > > > > 'make --shuffle=reverse' will build in reverse order. > > > In the example above, baz, bar, foo. > > > > > > 'make --shuffle' will randomize the build order. > > > > > > > > > If there exists a missing dependency among foo, bar, baz, > > > it will fail to build. > > > > > > > > > > > > We already perform the randconfig daily basis. > > > So, random-order parallel building is a similar idea. > > > > > > Perhaps, it makes sense to add the "--shuffle=SEED" option > > > but it requires GNU Make 4.4. (or GNU Make 4.4.1) > > > Is this too new? > > > > Our production environment is 4.3 right now. It will take extra > > time for us to upgrade the environment but it's doable for us. > > > > > > > > > > > > > > -- > > > Best Regards > > > Masahiro Yamada > > > -- Best Regards Masahiro Yamada