On 2023-08-12 14:02, Deri wrote:
On Saturday, 12 August 2023 18:02:24 BST Brian Inglis wrote:
On 2023-08-07 17:14, Deri wrote:
On Monday, 7 August 2023 09:45:22 BST Alejandro Colomar wrote:
Nevertheless, now I remember Deri told me he hardcoded a lot of stuff
for 1.22.4 which should be removed after the release of 1.23.0, so it
seems that the time has come to chop a lot of stuff from there.
Deri, would you mind simplifying the scripts assuming a Build-dep of
groff(>=1.23.0)?
Hi Alex, Brian,
I have done some work on building the pdf. One improvement is any warnings
output by groff, i.e. use of the deprecated .PDF macro, now identify the
particular man page and line number accurately.
I have attached two new replacement LinuxManBook directories. The first,
1.23.0, will run on a stock groff 1.23.0 system. The second, 1.23.0+, runs
with the latest gropdf which has a number of advantages for this project -
you will find the resulting pdf to be more than 5mb smaller, and the page
numbers in the overview pane match up with the page number at the bottom
of each page.
The file NewGropdf.pdf contains description of some of the features in the
new gropdf.
Both of these should continue to work if the groff version changes, thanks
to Brian's helpful suggestion to include /usr/share/groff/current in the
font path, but I have achieved this by specifying it in an -F flag rather
than patching gropdf.
Nice work Deri!
The official 6.05.01 book hyphenates words across page breaks more than
standard 1.23.0 and new 1.23.0+ gropdf books.
I'd like to investigate this to understand why this is happening, please can
you give me example page numbers which illustrate this.
Hi Deri,
Please see attached awk script and logs showing pages with end of page "hyphens"
in text of PDFs from `pdftotext -layout`: "official" PDF has 47, newer PDFs
break only at 5 compound word joins or double dashes.
File sizes are official 6.05.01 ~13.3MB, 200k more than standard 1.23.0
~13MB, which is >~5MB more than new 1.23.0+ gropdf <~8MB.
I now see page footers on all pages!
I noticed that new 1.23.0+ seems to set some lines, especially tables, a
little tighter (perhaps because of space handling), but *only* the first
page "intro(1)" has half the normal spacing from the page header to the
first heading!
Yes, I can see the difference in intro(1) and I can see a bug in the version
of an.tmac I provided which may affect hyphenation. Also a page number example
of the tighter table would be helpful.
The impression of tighter table spacing seems to be an artifact of more
consistent text and space rendering by pdftotext as pointed out in the diffs,
and as you explain below.
[I also noticed that *poppler* `pdf2text -layout` (used to diff the content
amd layout) prints the .SH NAME and options dashes as en-dash from the
official 6.05.01 book, but prints minus from standard 1.23.0 and new
1.23.0+ gropdf.]
This is intentional (and probably desirable). The pdf has a mapping so that
the groff character \- is displayed as HYPHEN (U+0201) but when text is copy/
pasted from the pdf it is converted to HYPHEN-MINUS (U+002D) which is the
character you get when you hit hyphen on the keyboard. This means that if you
are copy/pasteing from examples in the man page which includes hyphens then
your shell will interpret it correctly.
I notice a number of widows and orphans, but that may be the man macros or
groff commands not checking for sufficient space left on the page before
rendering text: allowing 4em before heading spacing, 3em before para
spacing would probably help, at the cost of larger bottom margins; and
groff footers need to allow extra space to prevent widows by allowing them
to intrude.
This probably needs a bit of tender curation! Bear in mind that the
BuildLinuxMan.pl script uses the flags "-dpaper=a4 -P-pa4" so if the man page
author has designed for a different page size the widows/orphans may well be
different.
As a Northern-American can I change your uses of "p?a4" to letter in the script
and expect it to work?
I added a paper variable, made the changes, it seems to work, and reduces end of
page hyphens to one compound word instance in mbind(2); log attached:
nodemask ... on-
...
line, ...
There appear to be 24 single word instances of online and 12 outdated hyphenated
compound word instances of on-line across all man pages.
UI: I also noticed, while looking for tables to compare, that pages are ordered
by filename not like rpmvercmp/ls -v/RPM::VersionSort e.g ISO_8859-2 is after
ISO_8859-16 which may not be as expected.
Used rpmvercmp in last line of perl sub sortman and works as expected.
Tech nitpick: .Z is still recognized by GUIs as compress output (UNIX-compressed
file) - is there no other file type suffix used for ditroff intermediate output?
Aha - Alex says .set:
https://lists.gnu.org/archive/html/groff/2023-04/msg00213.html
Added variables and changed those also in BLM-letter.pl: copy attached.
Thanks for your help.
Happy to help in any way.
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut
-- Antoine de Saint-Exupéry
#!/usr/bin/gawk -f
# lmb-eop-hy.awk - find LinuxManBook end of page hyphens
# on last non-blank line before footer
# save hyphenated line and skip to next
/-$/ {
h = $0;
next;
}
# if hyphenated line seen and footer found, print both lines, forget line seen
h && /^Linux\sman-pages\s6\.05\.01.*[0-9]+$/ {
print h;
print $0;
h = "";
}
# if hyphenated line seen and non-blank line found, forget line seen
h && NF > 0 {
h = "";
}
map_1 map_2 --| map_4 |--
Linux man-pages 6.05.01 2023-07-28 55
ter a successful execve(), and the process would gain privilege because the set-user-ID or set-group-
Linux man-pages 6.05.01 2023-05-03 135
To avoid corruption in multithreaded applications, mutexes are used internally to protect the memory-
Linux man-pages 6.05.01 2023-07-20 1415
drive that supports the LOCATE command (device-specific address) or a Tandberg-
Linux man-pages 6.05.01 2023-02-05 2073
stant Bandwidth Server). To set and fetch this policy and associated attributes, one must use the Linux-
Linux man-pages 6.05.01 2023-02-10 2649
map_1 map_2 --| map_4 |--
Linux man-pages 6.05.01 2023-07-28 55
ter a successful execve(), and the process would gain privilege because the set-user-ID or set-group-
Linux man-pages 6.05.01 2023-05-03 135
To avoid corruption in multithreaded applications, mutexes are used internally to protect the memory-
Linux man-pages 6.05.01 2023-07-20 1415
drive that supports the LOCATE command (device-specific address) or a Tandberg-
Linux man-pages 6.05.01 2023-02-05 2073
stant Bandwidth Server). To set and fetch this policy and associated attributes, one must use the Linux-
Linux man-pages 6.05.01 2023-02-10 2649
If the CPUs in an SMP system have different clock sources, then there is no way to maintain a correla-
Linux man-pages 6.05.01 2023-07-20 87
so would have caused the limit defined by the corresponding file in /proc/sys/user to be ex-
Linux man-pages 6.05.01 2023-05-03 101
but can be more efficient: if the unshared range extends past the current maximum number of file de-
Linux man-pages 6.05.01 2023-05-03 109
dicates that the peer closed its end of the channel. Subsequent reads from the channel will re-
Linux man-pages 6.05.01 2023-04-03 125
ter a successful execve(), and the process would gain privilege because the set-user-ID or set-group-
Linux man-pages 6.05.01 2023-05-03 135
or via a duplicate of the file descriptor created by fork(2), dup(2), fcntl() F_DUPFD, and so on) are al-
Linux man-pages 6.05.01 2023-03-30 162
padding byte in the linux_dirent structure. Thus, on kernels up to and including Linux 2.6.3, attempt-
Linux man-pages 6.05.01 2023-05-03 202
If either field in new_value.it_value is nonzero, then the timer is armed to initially expire at the speci-
Linux man-pages 6.05.01 2023-05-03 212
specified whether changes made to the file after the mmap() call are visible in the mapped re-
Linux man-pages 6.05.01 2023-07-20 395
dom writes to preallocated files, as well as cases where the MS_STRICTATIME mount op-
Linux man-pages 6.05.01 2023-04-03 406
If type is PERF_TYPE_RAW, then a custom "raw" config value is needed. Most CPUs sup-
Linux man-pages 6.05.01 2023-05-03 485
use of the performance counters (including the commonly enabled NMI Watchdog Timer in-
Linux man-pages 6.05.01 2023-05-03 488
tables. It does this by trapping the #BR exceptions that result at first use of missing bounds ta-
Linux man-pages 6.05.01 2023-07-28 551
system calls from executing or to SYSCALL_DISPATCH_FILTER_ALLOW to temporar-
Linux man-pages 6.05.01 2023-07-28 556
Unlike preadv() and pwritev(), if the offset argument is −1, then the current file offset is used and up-
Linux man-pages 6.05.01 2023-05-03 605
behavior. In this circumstance, read(2) has no effect (the datagram remains pending), while recv() con-
Linux man-pages 6.05.01 2023-07-18 613
SECCOMP_GET_NOTIF_SIZES operation, which returns a structure of type seccomp_no-
Linux man-pages 6.05.01 2023-05-03 666
in Linux 2.2, the fixed-size, 32-bit sigset_t type supported by that system call was no longer fit for pur-
Linux man-pages 6.05.01 2023-05-03 772
System V also provides these semantics for signal(). This was bad because the signal might be deliv-
Linux man-pages 6.05.01 2023-03-30 778
Linux 2.4.15. When this is so, the version where the system call appeared in both of the major ker-
Linux man-pages 6.05.01 2023-07-30 838
mation into the notification. One needs to enable this feature explicitly using the UFFD_FEA-
Linux man-pages 6.05.01 2023-05-03 898
malloc(3). If you need certain calls to these two functions to not allocate memory (in signal han-
Linux man-pages 6.05.01 2023-07-20 971
than data in a register, and thus the explicit_bzero() call creates a brief time window where the sen-
Linux man-pages 6.05.01 2023-07-20 988
change the current directory. Note that applications should not themselves change their cur-
Linux man-pages 6.05.01 2023-07-20 1185
not set.) In this case the sb argument passed to fn() contains information returned by per-
Linux man-pages 6.05.01 2023-07-20 1189
host byte order suitable for use as an Internet network address. On success, the converted address is re-
Linux man-pages 6.05.01 2023-07-20 1342
To avoid corruption in multithreaded applications, mutexes are used internally to protect the memory-
Linux man-pages 6.05.01 2023-07-20 1415
flags are not specified, the child inherits the corresponding scheduling attributes from the par-
Linux man-pages 6.05.01 2023-05-03 1524
drive that supports the LOCATE command (device-specific address) or a Tandberg-compati-
Linux man-pages 6.05.01 2023-02-05 2073
which have not been demand-loaded in, or which are swapped out. This value is in-
Linux man-pages 6.05.01 2023-07-08 2171
system interrupts. The first column is the total of all interrupts serviced including un-
Linux man-pages 6.05.01 2023-07-08 2190
The interfaces can be accessed by reading or writing the /proc/sys/net/ipv4/neigh/*/* files. Each inter-
Linux man-pages 6.05.01 2023-07-15 2256
hard disk, with Y in ’a’–’d’; ’sd’ for SCSI compatible disk, with Y in ’a’–’e’), Y the driver let-
Linux man-pages 6.05.01 2023-02-05 2266
The special treatments of user ID 0 (root) described in this subsection can be disabled using the se-
Linux man-pages 6.05.01 2023-05-03 2336
min−guide/cgroup−v1/blkio−controller.rst (or Documentation/cgroup−v1/blkio−con-
Linux man-pages 6.05.01 2023-04-03 2345
A cpuset directory that contains no child cpuset directories, and has no attached processes, can be re-
Linux man-pages 6.05.01 2023-07-18 2369
described in time(7). The setting of sched_relax_domain_level applies only to immediate load balanc-
Linux man-pages 6.05.01 2023-07-18 2374
value 700 (600 before glibc 2.10; 500 before glibc 2.2). In addition, various GNU-specific ex-
Linux man-pages 6.05.01 2023-07-15 2413
EINVAL when this setting is missing. sin_port contains the port in network byte order. The port num-
Linux man-pages 6.05.01 2023-07-15 2446
Always include periods in such abbreviations, as shown here. In addition, "e.g." and "i.e." should al-
Linux man-pages 6.05.01 2023-03-30 2541
calls that operate on process IDs always operate using the process ID that is visible in the PID name-
Linux man-pages 6.05.01 2023-03-30 2593
from write(2) to see how many bytes were actually written), and these bytes may be inter-
Linux man-pages 6.05.01 2023-07-16 2597
stant Bandwidth Server). To set and fetch this policy and associated attributes, one must use the Linux-
Linux man-pages 6.05.01 2023-02-10 2649
(logical OR), the hardware accumulates the bits that are subsequently written to it. The possi-
Linux man-pages 6.05.01 2023-02-05 2696
fault that includes the sender’s PID, real user ID, and real group ID, if the sender did not spec-
Linux man-pages 6.05.01 2023-07-15 2754
to load an arbitrary URI that automatically detects the users’ environment (e.g., text or graphics, desk-
Linux man-pages 6.05.01 2023-04-30 2770
records the user namespace of the creating process as the owner of the new namespace. (This associa-
Linux man-pages 6.05.01 2023-05-03 2775
#!/usr/bin/perl -w
#
# BuildLinuxMan.pl : Build Linux manpages book
# Deri James : 15 Dec 2022
#
# Params:-
#
# $1 = Directory holding the man pages
#
# (C) Copyright 2022, Deri James
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details
# (http://www.gnu.org/licenses/gpl-2.0.html).
#
use strict;
my $dir=shift || '.';
my @aliases=`egrep -l '^\\.so' $dir/man*/*`;
my %alias;
my %target;
my $inTS=0;
my $inBlock=0;
my %Sections=
(
"1" => "General Commands Manual",
"2" => "System Calls Manual",
"2type" => "System Calls Manual (types)",
"3" => "Library Functions Manual",
"3const" => "Library Functions Manual (constants)",
"3head" => "Library Functions Manual (headers)",
"3type" => "Library Functions Manual (types)",
"4" => "Kernel Interfaces Manual",
"5" => "File Formats Manual",
"6" => "Games Manual",
"7" => "Miscellaneous Information Manual",
"8" => "System Manager's Manual",
"9" => "Kernel Developer's Manual",
);
my $Section='';
my $temp='LMB.man';
LoadAlias();
BuildBook();
my $format='pdf';
my $paper='letter';
my $cmdstring="-T$format -k -pet -M. -F. -mandoc -manmark -dpaper=$paper -P-p$paper -rC1 -rCHECKSTYLE=3";
my $front='LMBfront.t';
my $frontdit='LMBfront.set';
my $mandit='LinuxManBook.set';
my $book="LinuxManBook.$format";
system("groff -T$format -ms $front -Z > $frontdit");
system("groff -z -dPDF.EXPORT=1 -dLABEL.REFS=1 $temp $cmdstring 2>&1 | LC_ALL=C grep '^\\. *ds' | groff -T$format $cmdstring - $temp -Z > $mandit");
system("./gro$format -F.:/usr/share/groff/current $frontdit $mandit -p$paper > $book");
#unlink "$mandit","$temp","$frontdit"; # If you want to clean up
# Aliases are the man pages which .so another man page, so build a hash of them so
# that when we are processing referenced man page we can add the target for the
# bookmark.
sub LoadAlias
{
foreach my $fn (@aliases)
{
chomp($fn);
my (@pth)=split('/',$fn);
my $nm=pop(@pth);
my $bkmark="$1_$2" if $nm=~m/(.*)\.(\w+)/;
if (open(F,"<$fn"))
{
while (<F>)
{
next if m/^\.\\"/;
if (m/^.so\s+(man\w+\/(.+)\.(.+?))$/)
{
$alias{$bkmark}=["$2_$3",$2,$3];
push(@{$target{"$2_$3"}},$bkmark);
last;
}
else
{
print STDERR "Alias fail: $fn\n";
}
}
close(F);
}
else
{
print STDERR "Open fail: $fn\n";
}
}
}
sub BuildBook
{
open(BK,">$temp");
print BK ".pdfpagenumbering D . 1\n";
foreach my $fn (sort sortman glob("$dir/man*/*"))
{
my ($nm,$sec,$srt)=GetNmSec($fn);
my $bkmark="$1_$2" if $nm=~m/(.*)\.(\w+)/;
my $title= "$1\\($2\\)";
# If this is an alias, just add it to the outline panel.
if (exists($alias{$bkmark}))
{
print BK ".eo\n.device ps:exec [/Dest /$alias{$bkmark}->[0] /Title ($title) /Level 2 /OUT pdfmark\n.ec\n";
print BK ".if dPDF.EXPORT .tm .ds pdf:look($bkmark) $alias{$bkmark}->[1]($alias{$bkmark}->[2])\n";
next;
}
print BK ".\\\" >>>>>> $1($2) <<<<<<\n.lf 0 $bkmark\n";
if (open(F,'<',$fn))
{
while (<F>)
{
if (m/^\.\\"/)
{
print BK $_;
next;
}
chomp;
# This code is to determine whether we are within a tbl block and in a text block
# T{ and T}. This is fudge code particularly for the syscalls(7) page.
$inTS=1 if m/\.TS/;
$inTS=0,$inBlock=0 if m/\.TE/;
s/\r$//; # In case edited under windows i.e. CR/LF
s/\s+$//;
next if !$_;
# s/^\s+//;
if (m/^\.BR\s+([-\w\\.]+)\s+\((.+?)\)(.*)/)
{
my $bkmark="$1";
my $sec=$2;
my $after=$3;
my $dest=$bkmark;
$dest=~s/\\-/-/g;
$_=".MR \"$bkmark\" \"$sec\" \"$after\" \"$dest\"";
}
s/^\.BI \\fB/.BI /;
s/^\.BR\s+(\S+)\s*$/.B $1/;
s/^\.BI\s+(\S+)\s*$/.B $1/;
s/^\.IR\s+(\S+)\s*$/.I $1/;
# Fiddling for syscalls(7) :-(
if ($inTS)
{
my @cols=split(/\t/,$_);
foreach my $c (@cols)
{
$inBlock+=()=$c=~m/T\{/g;
$inBlock-=()=$c=~m/T\}/g;
my $mtch=$c=~s/\s*\\fB([-\w.]+)\\fP\((\w+)\)/\n.MR $1 $2 \\c\n/g;
$c="T{\n${c}\nT}" if $mtch and !$inBlock;
}
$_=join("\t",@cols);
s/\n\n/\n/g;
}
if (m/^\.TH\s+([-\w\\.]+)\s+(\w+)/)
{
# if new section add top level bookmark
if ($sec ne $Section)
{
print BK ".nr PDFOUTLINE.FOLDLEVEL 1\n.fl\n";
print BK ".pdfbookmark 1 $Sections{$sec}\n";
print BK ".nr PDFOUTLINE.FOLDLEVEL 2\n";
$Section=$sec;
}
print BK "$_\n";
# Add a level two bookmark. We don't set it in the TH macro since the name passed
# may be different from the filename, i.e. file = unimplemented.2, TH = UNIMPLEMENTED 2
print BK ".pdfbookmark -T $bkmark 2 $1($2)\n";
# If this page is referenced by an alias plant a destination label for the alias.
if (exists($target{$bkmark}))
{
foreach my $targ (@{$target{$bkmark}})
{
print BK ".pdf*href.set $targ\n";
}
}
next;
}
print BK "$_\n";
}
close(F);
}
}
close(BK);
}
sub GetNmSec
{
my (@pth)=split('/',shift);
my $nm=pop(@pth);
my $sec=substr(pop(@pth),3);
my $srt=$nm;
$srt=~s/^_+//;
$srt="$sec/$srt";
return($nm,$sec,$srt);
}
# add rpmvercmp
use RPM::VersionSort;
sub sortman
{
# Sort - ignore case but frig it so that intro is the first entry.
my (undef,$s1,$c)=GetNmSec($a);
my (undef,$s2,$d)=GetNmSec($b);
my $cmp=$s1 cmp $s2;
return $cmp if $cmp;
return -1 if ($c=~m/\/intro/ and $d!~m/\/intro/);
return 1 if ($d=~m/\/intro/ and $c!~m/\/intro/);
return rpmvercmp(lc($c),lc($d));
# return (lc($c) cmp lc($d));
}
nodemask argument is ignored. Where a nodemask is required, it must contain at least one node that is on-
Linux man-pages 6.05.01 2023-07-16 384