On Fri, Feb 24, 2023 at 9:13 PM André Almeida <andrealmeid@xxxxxxxxxx> wrote: > > Hi Masahiro, > > Em 24/02/2023 02:38, Masahiro Yamada escreveu: > > On Thu, Feb 23, 2023 at 9:17 AM André Almeida <andrealmeid@xxxxxxxxxx> wrote: > >> > >> As it's done for zstd compression, enable multithread compression for > >> xz to speed up module installation. > >> > >> Signed-off-by: André Almeida <andrealmeid@xxxxxxxxxx> > >> --- > >> > >> On my setup xz is a bottleneck during module installation. Here are the > >> numbers to install it in a local directory, before and after this patch: > >> > >> $ time make INSTALL_MOD_PATH=/home/tonyk/codes/.kernel_deploy/ modules_install -j16 > >> Executed in 100.08 secs > >> > >> $ time make INSTALL_MOD_PATH=/home/tonyk/codes/.kernel_deploy/ modules_install -j16 > >> Executed in 28.60 secs > > > > > > Heh, this is an interesting benchmark. > > > > Without this patch, you ran 16 processes of 'xz' in parallel > > since you gave -j16. > > > > You created multi-threads in each xz process, then you got 3x faster. > > What made it happen? > > > > > > During the modules installation in my setup, the build system would > spend most of it's time compressing big modules (such as the 350M > amdgpu.ko) in a single thread, with 15 idles threads. Enabling > multithread allowed amdgpu to be compressed really fast. It is a corner case, isn't it? amdgpu.ko appears early in modules.order. In most use-cases, other *.ko will fill the idle threads. xz(1) says Setting threads to a special value 0 makes xz use up to as many threads as the processor(s) on the system support. So, 'make -j$(nproc) modules_install' will have (nproc * nproc) threads at maximum. Of course, this is a theoretical calculation. The actual number of spawned threads will be much less, but spawning too many threads may not be nice. For your case, Nathan's suggestion will do. > > The real performance improvement during modules compression is not > compressing as many small modules as possible in parallel, but > compressing the big ones in multithread, that proved to be the > bottleneck in my setup. > > > How many threads can your system run? > > $ nproc > 16 > > > > > I did not get such an improvement in my testing. > > In my machine $(nproc) is 24. > > > > > > [Without this patch] > > > > $ time make INSTALL_MOD_PATH=/tmp/inst1 modules_install -j$(nproc) > > > > real 0m33.965s > > user 10m6.118s > > sys 0m37.231s > > > > [With this patch] > > > > $ time make INSTALL_MOD_PATH=/tmp/inst1 modules_install -j$(nproc) > > > > real 0m32.568s > > user 10m4.472s > > sys 0m39.132s > > > > > > I can see that my patch did not introduce performance regressions to > your setup, at least. > > > > > Given that GNU Make provides the parallel execution environment, > > you can control the number of processes of 'xz'. > > > > There is no point in forcing multi-threading, which the user > > did not ask or ever want. > > > > > > Should we drop -T0 from zstd then? Is currently forcing multi-threading. I think yes. -- Best Regards Masahiro Yamada