Alejandro Colomar <alx.manpages@xxxxxxxxx> writes: > [[PGP Signed Part:Undecided]] > [Added back linux-man@, and people that commented on this (sub)topic] > [Added Sam, I've got a question for you] > > Hi Alexis, > > Please keep (at least) linux-man@ in the loop. > > On 4/9/23 08:44, Alexis wrote: >> >> As a related data point, i'd like to mention Gentoo's position on >> this, i.e. that man pages will continue to be bzip2-compressed by >> default: >> >> "app-text/mandoc bzip2 support" >> https://bugs.gentoo.org/854267 >> >> "Remove /usr/share/man from default inclusion list for docompress" >> https://bugs.gentoo.org/836367 > > As Ingo said[1] 3 years ago, I don't think in this year it makes any > sense to compress pages anymore. However, since it's simple for me > to add support for that, and it can be interesting for testing > purposes, I added support for installing the Linux man-pages > compressed with bzip2 using the Makefile[2]. While I was at it, I > also added support for generating .tar.bz2 release tarballs[3]. > > With this, I was able to test a bit more than what I did yesterday: > > > $ sudo rm -rf /opt/local/man/ > $ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz | wc -l > 2570 > $ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink Z=.bz2 | wc -l > 2570 > $ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= | wc -l > 2570 > $ du -sh /opt/local/man/* > 5.4M /opt/local/man/bz2 > 5.5M /opt/local/man/gz_ > 9.4M /opt/local/man/man > > > $ export MANPATH=/opt/local/man/gz_/share/man > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l" > 37 > 0.31 > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l RLIMIT_NOFILE | wc -l" > 17 > 1.56 > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l RLIMIT_NOFILE | wc -l" > 17 > 1.56 > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" > 17 > 1.24 > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" > 17 > 1.14 > > > $ export MANPATH=/opt/local/man/bz2/share/man > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l" > 37 > 10.90 > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l RLIMIT_NOFILE | wc -l" > 17 > 1.33 > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l RLIMIT_NOFILE | wc -l" > 17 > 1.31 > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" > 17 > 1.21 > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" > 17 > 1.22 > > > $ export MANPATH=/opt/local/man/man/share/man > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l" > 37 > 0.56 > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l" > 17 > 0.01 > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l" > 17 > 0.01 > > Weird thing: today, the symlink bug in man(1) was reproducible in > all kinds of pages, while yesterday it only reproduced in > uncompressed ones. > > Another weird thing: times today changed considerably for the > find(1) pipelines (half of yesterday's). It's not a thing of > using dash(1), because I get similar times with bash(1) and its > builtin time(1). > > Important note: Sam, are you sure you want your pages compressed > with bz2? Have you seen the 10 seconds it takes man-db's man(1) to > find a word in the pages? I suggest that at least you try to > reproduce these tests in your machine, and see if it's just me or > man-db's man(1) is pretty bad at non-gz pages. > > Test results: > > - man-db's man(1) is slower with plain man(7) source than with .gz > pages for some misterious reason. > > - man-db's man(1) is turtle slow with .bz2 pages. I started looking into changing to xz (or just.. not bz2, anyway), partially motivated by https://gitlab.com/man-db/man-db/-/issues/4 / just interest locally (without having done measurements to see if it would be worth a global change) and the xz maintainer ended up recommending a different implementation to how man-db currently handles external utilties entirely (which I have a draft of). The xz author had some suggestions on the best parameters to use for man pages too which I need to look into and dig up... https://bugs.gentoo.org/169260 was an interesting discussion about our choice of bz2 (it came up a bit in https://bugs.gentoo.org/372653 too). (I'll get back and read the rest of the thread later, but wanted to add this tidbit.) Definitely surprised to learn bz2 is *that* bad though! best, sam
Attachment:
signature.asc
Description: PGP signature