Hi Colin, On Sat, Nov 02, 2024 at 11:47:14PM +0000, Colin Watson wrote: > On Sat, Nov 02, 2024 at 10:36:20PM +0100, Alejandro Colomar wrote: > > This is quite naive, and will not work with pages that define their own > > stuff, since this script is not groff(1). But it should be as fast as > > is possible, which is what Colin wants, is as simple as it can be (and > > thus relatively safe), and should work with most pages (as far as > > indexing is concerned, probably all?). > > I seem to be being invoked here for something I actually don't think I > want at all, which suggests that wires have been crossed somewhere. Can > you explain why I'd want to replace some part of a fairly well-optimized > and established C program with a shell pipeline? I'm pretty certain it > would not be faster, at least. Are you sure? With a small tweak, I get the following comparison: alx@devuan:~/src/linux/man-pages/man-pages/main$ time lexgrog man/*/* | wc lexgrog: can't resolve man7/groff_man.7 12475 99295 919842 real 0m6.166s user 0m5.132s sys 0m1.336s alx@devuan:~/src/linux/man-pages/man-pages/main$ time mansect NAME man/ \ | groff -man -Tutf8 | wc 9830 27109 689478 real 0m0.156s user 0m0.219s sys 0m0.019s Yes, I'm working with uncompressed pages. We'd need to add support for handling compressed pages. Also, we'd need to compare the performance of lexgrog(1) with compressed pages. But for a starter, this suggests some good performance. (I say with a small tweak, because the version I've posted uses xargs -L1, but I've tested for performance without the -L1, which is the main bottleneck. It has no consequences for the NAME. I need to work out some nasty details with sed -n1 for the generic version, though.) Have a lovely night! Alex -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature