On Wed, 12 Apr 2023 09:13:13 +0100 Sam James <sam@xxxxxxxxxx> wrote: > > Alejandro Colomar <alx.manpages@xxxxxxxxx> writes: > > > [[PGP Signed Part:Undecided]] > > [Added back linux-man@, and people that commented on this (sub)topic] > > [Added Sam, I've got a question for you] > > > > Hi Alexis, > > > > Please keep (at least) linux-man@ in the loop. > > > > On 4/9/23 08:44, Alexis wrote: > >> > >> As a related data point, i'd like to mention Gentoo's position on > >> this, i.e. that man pages will continue to be bzip2-compressed by > >> default: > >> > >> "app-text/mandoc bzip2 support" > >> https://bugs.gentoo.org/854267 > >> > >> "Remove /usr/share/man from default inclusion list for docompress" > >> https://bugs.gentoo.org/836367 > > > > As Ingo said[1] 3 years ago, I don't think in this year it makes any > > sense to compress pages anymore. However, since it's simple for me > > to add support for that, and it can be interesting for testing > > purposes, I added support for installing the Linux man-pages > > compressed with bzip2 using the Makefile[2]. While I was at it, I > > also added support for generating .tar.bz2 release tarballs[3]. > > > > With this, I was able to test a bit more than what I did yesterday: > > > > > > $ sudo rm -rf /opt/local/man/ > > $ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz | wc -l > > 2570 > > $ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink Z=.bz2 | wc -l > > 2570 > > $ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= | wc -l > > 2570 > > $ du -sh /opt/local/man/* > > 5.4M /opt/local/man/bz2 > > 5.5M /opt/local/man/gz_ > > 9.4M /opt/local/man/man > > > > > > $ export MANPATH=/opt/local/man/gz_/share/man > > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l" > > 37 > > 0.31 > > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l RLIMIT_NOFILE | wc -l" > > 17 > > 1.56 > > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l RLIMIT_NOFILE | wc -l" > > 17 > > 1.56 > > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" > > 17 > > 1.24 > > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" > > 17 > > 1.14 > > > > > > $ export MANPATH=/opt/local/man/bz2/share/man > > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l" > > 37 > > 10.90 > > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l RLIMIT_NOFILE | wc -l" > > 17 > > 1.33 > > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l RLIMIT_NOFILE | wc -l" > > 17 > > 1.31 > > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" > > 17 > > 1.21 > > $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l" > > 17 > > 1.22 > > > > > > $ export MANPATH=/opt/local/man/man/share/man > > $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l" > > 37 > > 0.56 > > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l" > > 17 > > 0.01 > > $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l" > > 17 > > 0.01 > > > > Weird thing: today, the symlink bug in man(1) was reproducible in > > all kinds of pages, while yesterday it only reproduced in > > uncompressed ones. > > > > Another weird thing: times today changed considerably for the > > find(1) pipelines (half of yesterday's). It's not a thing of > > using dash(1), because I get similar times with bash(1) and its > > builtin time(1). > > > > Important note: Sam, are you sure you want your pages compressed > > with bz2? Have you seen the 10 seconds it takes man-db's man(1) to > > find a word in the pages? I suggest that at least you try to > > reproduce these tests in your machine, and see if it's just me or > > man-db's man(1) is pretty bad at non-gz pages. > > > > Test results: > > > > - man-db's man(1) is slower with plain man(7) source than with .gz > > pages for some misterious reason. > > > > - man-db's man(1) is turtle slow with .bz2 pages. > > I started looking into changing to xz (or just.. not bz2, anyway), > partially motivated by https://gitlab.com/man-db/man-db/-/issues/4 / > just interest locally (without having done measurements to see if it > would be worth a global change) and the xz maintainer ended up > recommending a different implementation to how man-db currently handles > external utilties entirely (which I have a draft of). > > The xz author had some suggestions on the best parameters to use > for man pages too which I need to look into and dig up... > > https://bugs.gentoo.org/169260 was an interesting discussion > about our choice of bz2 (it came up a bit in > https://bugs.gentoo.org/372653 too). Oh, I remember this. Soon after #372653 was closed, I experimented further and found xz --lzma2=preset=6e,pb=0 to be more effective than bzip -9, both in terms of compression ratio and subsequent decompression performance, so I used those settings for a time. Nowadays, I would be more concerned with the time taken to render a man page than in reducing the footprint of the installed documentation. > > (I'll get back and read the rest of the thread later, but wanted > to add this tidbit.) > > Definitely surprised to learn bz2 is *that* bad though! > > best, > sam -- Kerin Millar