[Added back linux-man@, and people that commented on this (sub)topic] [Added Sam, I've got a question for you] Hi Alexis, Please keep (at least) linux-man@ in the loop. On 4/9/23 08:44, Alexis wrote: > > As a related data point, i'd like to mention Gentoo's position on > this, i.e. that man pages will continue to be bzip2-compressed by > default: > > "app-text/mandoc bzip2 support" > https://bugs.gentoo.org/854267 > > "Remove /usr/share/man from default inclusion list for docompress" > https://bugs.gentoo.org/836367 As Ingo said[1] 3 years ago, I don't think in this year it makes any sense to compress pages anymore. However, since it's simple for me to add support for that, and it can be interesting for testing purposes, I added support for installing the Linux man-pages compressed with bzip2 using the Makefile[2]. While I was at it, I also added support for generating .tar.bz2 release tarballs[3]. With this, I was able to test a bit more than what I did yesterday: $ sudo rm -rf /opt/local/man/ $ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz | wc -l 2570 $ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink Z=.bz2 | wc -l 2570 $ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= | wc -l 2570 $ du -sh /opt/local/man/* 5.4M /opt/local/man/bz2 5.5M /opt/local/man/gz_ 9.4M /opt/local/man/man $ export MANPATH=/opt/local/man/gz_/share/man $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l" 37 0.31 $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l RLIMIT_NOFILE | wc -l" 17 1.56 $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l RLIMIT_NOFILE | wc -l" 17 1.56 $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" 17 1.24 $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" 17 1.14 $ export MANPATH=/opt/local/man/bz2/share/man $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l" 37 10.90 $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l RLIMIT_NOFILE | wc -l" 17 1.33 $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l RLIMIT_NOFILE | wc -l" 17 1.31 $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" 17 1.21 $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" 17 1.22 $ export MANPATH=/opt/local/man/man/share/man $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l" 37 0.56 $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l" 17 0.01 $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l" 17 0.01 Weird thing: today, the symlink bug in man(1) was reproducible in all kinds of pages, while yesterday it only reproduced in uncompressed ones. Another weird thing: times today changed considerably for the find(1) pipelines (half of yesterday's). It's not a thing of using dash(1), because I get similar times with bash(1) and its builtin time(1). Important note: Sam, are you sure you want your pages compressed with bz2? Have you seen the 10 seconds it takes man-db's man(1) to find a word in the pages? I suggest that at least you try to reproduce these tests in your machine, and see if it's just me or man-db's man(1) is pretty bad at non-gz pages. Test results: - man-db's man(1) is slower with plain man(7) source than with .gz pages for some misterious reason. - man-db's man(1) is turtle slow with .bz2 pages. - xargs -P0 doesn't affect significantly. As Ralph said, this is probably because the main issue with find(1) was having the bottleneck in clone/fork+exec, and xargs(1) already solves that. Expanding the pipeline to use zcat(1) instead of zgrep(1) improves a little bit more, because the zgrep(1) script is probably quite inefficient, while zcat(1) is just a simple wrapper around gzip(1). We see that zgrep(1) is more inefficient than running ourselves a few programs per file in a pipeline! Calling gzip(1) directly is even faster, since we avoid invoking a shell for such a small script. Expanding the bzgrep(1) pipeline into one using bzcat(1) has similar improvements. However, since bzcat(1) is a binary, we don't get further improvement from calling bzip2(1) directly. Cheers, Alex > > > Alexis. > [1]: <https://marc.info/?l=mandoc-discuss&m=160668087317110&w=2> [2]: <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=6a828d5b6879ef19c3f59034fe1d0850d25d0056> [3]: <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=e5b23b9c5b318d69ee78af0906e3bf0c665f9ae5> -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature