Poor performance of man -K for uncompressed pages (was: man -K finds repeated entries for each symlink page)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Colin,

On 4/9/23 16:55, Colin Watson wrote:
> On Sun, Apr 09, 2023 at 03:58:28PM +0200, Alejandro Colomar wrote:
>> $ man -Kaw RLIMIT_NOFILE | sort | uniq -c
>>       3 /opt/local/man/share/man/man2/dup.2
>>       2 /opt/local/man/share/man/man2/fcntl.2
>>       5 /opt/local/man/share/man/man2/getrlimit.2
>>       3 /opt/local/man/share/man/man2/open.2
>>       1 /opt/local/man/share/man/man2/pidfd_getfd.2
>>       1 /opt/local/man/share/man/man2/pidfd_open.2
>>       2 /opt/local/man/share/man/man2/poll.2
>>       1 /opt/local/man/share/man/man2/seccomp_unotify.2
>>       4 /opt/local/man/share/man/man2/select.2
>>
>> Those numbers coincide with 1+ the number of symlinks for each of the
>> pages.  For example, see select.2:
> 
> Thanks for the report.  Fixed by this commit:
> 
>   https://gitlab.com/man-db/man-db/-/commit/7ef30573a7023eb78bf70a34edaa4e3906531993

Heh, that was fast :)

As a side effect of not reading too many files, performance improved
considerably for bzip2 (~3x), and for gzip (~2x).

I built man from source (tweaking with -O3, so I cheated a little bit),
and here are the results:


$ export MANPATH=/tmp/man/gz_/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
0.19
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.14


$ export MANPATH=/tmp/man/bz2/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
3.05
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.20


$ export MANPATH=/tmp/man/man/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
0.52
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l"
17
0.01


Please consider this a new bug report, about performance.  See the last
block of commands.  man(1) takes half a second, while my loop with
find(1) and grep(1) is almost non-measurable.  I could understand that
man(1) has some overhead, but 52x feels like there's some serious
performance problem; especially when man(1) is faster reading
uncompressed pages (see at the top).


Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux