On 4/9/23 14:05, Alejandro Colomar wrote: > [Added back linux-man@, and people that commented on this (sub)topic] > [Added Sam, I've got a question for you] > > Hi Alexis, > > Please keep (at least) linux-man@ in the loop. > > On 4/9/23 08:44, Alexis wrote: >> >> As a related data point, i'd like to mention Gentoo's position on >> this, i.e. that man pages will continue to be bzip2-compressed by >> default: >> >> "app-text/mandoc bzip2 support" >> https://bugs.gentoo.org/854267 >> >> "Remove /usr/share/man from default inclusion list for docompress" >> https://bugs.gentoo.org/836367 > > As Ingo said[1] 3 years ago, I don't think in this year it makes any > sense to compress pages anymore. However, since it's simple for me > to add support for that, and it can be interesting for testing > purposes, I added support for installing the Linux man-pages > compressed with bzip2 using the Makefile[2]. While I was at it, I > also added support for generating .tar.bz2 release tarballs[3]. > > With this, I was able to test a bit more than what I did yesterday: > > > $ sudo rm -rf /opt/local/man/ > $ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz | wc -l > 2570 > $ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink Z=.bz2 | wc -l > 2570 > $ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= | wc -l > 2570 > $ du -sh /opt/local/man/* > 5.4M /opt/local/man/bz2 > 5.5M /opt/local/man/gz_ > 9.4M /opt/local/man/man > > > $ export MANPATH=/opt/local/man/gz_/share/man > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l" > 37 > 0.31 > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l RLIMIT_NOFILE | wc -l" > 17 > 1.56 > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l RLIMIT_NOFILE | wc -l" > 17 > 1.56 > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" > 17 > 1.24 > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" > 17 > 1.14 > > > $ export MANPATH=/opt/local/man/bz2/share/man > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l" > 37 > 10.90 > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l RLIMIT_NOFILE | wc -l" > 17 > 1.33 > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l RLIMIT_NOFILE | wc -l" > 17 > 1.31 > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" > 17 > 1.21 > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" > 17 > 1.22 > > > $ export MANPATH=/opt/local/man/man/share/man > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l" > 37 > 0.56 > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l" > 17 > 0.01 > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l" > 17 > 0.01 > > Weird thing: today, the symlink bug in man(1) was reproducible in > all kinds of pages, while yesterday it only reproduced in > uncompressed ones. > > Another weird thing: times today changed considerably for the > find(1) pipelines (half of yesterday's). It's not a thing of > using dash(1), because I get similar times with bash(1) and its > builtin time(1). > > Important note: Sam, are you sure you want your pages compressed > with bz2? Have you seen the 10 seconds it takes man-db's man(1) to > find a word in the pages? I suggest that at least you try to > reproduce these tests in your machine, and see if it's just me or > man-db's man(1) is pretty bad at non-gz pages. > > Test results: > > - man-db's man(1) is slower with plain man(7) source than with .gz > pages for some misterious reason. > > - man-db's man(1) is turtle slow with .bz2 pages. > > - xargs -P0 doesn't affect significantly. As Ralph said, this is > probably because the main issue with find(1) was having the > bottleneck in clone/fork+exec, and xargs(1) already solves that. > > Expanding the pipeline to use zcat(1) instead of zgrep(1) > improves a little bit more, because the zgrep(1) script is > probably quite inefficient, while zcat(1) is just a simple > wrapper around gzip(1). We see that zgrep(1) is more > inefficient than running ourselves a few programs per file in a > pipeline! > > Calling gzip(1) directly is even faster, since we avoid invoking > a shell for such a small script. > > Expanding the bzgrep(1) pipeline into one using bzcat(1) has > similar improvements. However, since bzcat(1) is a binary, we > don't get further improvement from calling bzip2(1) directly. And I forgot the obvious one: - Using plain man(7) source is blazingly fast. So much that I don't miss mdoc(7)'s indexability so much. However, I must admit that I do miss mdoc(7)'s power sometimes. The man_lsfunc() and man_lsvar() functions for finding function prototypes and variable declarations in man(7) source would be much simpler using mdoc(1), and I could even use mandoc(1) to find such things. > > > Cheers, > Alex > >> >> >> Alexis. >> > > > [1]: <https://marc.info/?l=mandoc-discuss&m=160668087317110&w=2> > > [2]: <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=6a828d5b6879ef19c3f59034fe1d0850d25d0056> > > [3]: <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=e5b23b9c5b318d69ee78af0906e3bf0c665f9ae5> > -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature