Re: No 6.05/.01 pdf book available

Alejandro Colomar <alx@xxxxxxxxxx> · Mon, 28 Aug 2023 23:11:56 +0200

Hi Brian,

On 2023-08-28 20:24, Brian Inglis wrote:
> On 2023-08-28 06:17, Alejandro Colomar wrote:
>> Hi Brian,
>>
>> On 2023-08-22 01:45, Brian Inglis wrote:
>>> I am in favour of all punctuation being treated as word spaces and sorting
>>> "cat ..." before "cat..." but find the real orders more evocative and easier to
>>> decide about than examples.
>>
>> Here's an excerpt of how treating - and _ as spaces looks like.  I think
>> it's a reasonable order.  Should I apply that diff?
>>
>> Cheers,
>> Alex
>>
>> $ git diff
>> diff --git a/scripts/sortman b/scripts/sortman
>> index a8f70bab5..6d1d92f09 100755
>> --- a/scripts/sortman
>> +++ b/scripts/sortman
>> @@ -9,7 +9,7 @@ sed   -E '/\/intro./  s/.*\.([[:digit:]])/\10\t&/' \
>>   | sed -E '            s/\t(.*)/&\n\1/' \
>>   | sed -E '/\t/        s/\.[[:digit:]]([[:alpha:]][[:alnum:]]*)?\>.*//' \
>>   | sed -E '/\t/        s/\/[_-]*/\//g' \
>> -| sed -E '/\t/        s/[_-]/_/g' \
>> +| sed -E '/\t/        s/[_-]/ /g' \
>>   | sed -E '/\t/        {N;s/\n/\t/;}' \
>>   | sort -fV -k1,2 \
>>   | cut -f3;
>> $ touch man8/ld-z.8
>> $ touch man8/ld.8
>> $ find man8 | ./scripts/sortman
>> man8/intro.8
>> man8/iconvconfig.8
>> man8/ld.8
>> man8/ld-linux.8
>> man8/ld-linux.so.8
>> man8/ld-z.8
>> man8/ld.so.8
>> man8/ldconfig.8
>> man8/nscd.8
>> man8/sln.8
>> man8/tzselect.8
>> man8/zdump.8
>> man8/zic.8
>> man8
> 
> Looks better,

Thanks, I've applied and pushed the patch.

> but should your sort *key* field instance also drop the section 
> suffix (already in prefix)

It is already dropped.  Am I understanding it correctly?
Here's a debug patch to view the sort key field:

diff --git a/scripts/sortman b/scripts/sortman
index 6d1d92f09..e690f23ea 100755
--- a/scripts/sortman
+++ b/scripts/sortman
@@ -12,4 +12,5 @@ sed   -E '/\/intro./  s/.*\.([[:digit:]])/\10\t&/' \
 | sed -E '/\t/        s/[_-]/ /g' \
 | sed -E '/\t/        {N;s/\n/\t/;}' \
 | sort -fV -k1,2 \
+| tee /dev/tty \
 | cut -f3;


And here's how it looks with man8 (plus the dummy files):


$ find man8 -type f | ./scripts/sortman
80	man8/intro	man8/intro.8
81	man8/iconvconfig	man8/iconvconfig.8
81	man8/ld	man8/ld.8
81	man8/ld linux	man8/ld-linux.8
81	man8/ld linux.so	man8/ld-linux.so.8
81	man8/ld z	man8/ld-z.8
81	man8/ld.so	man8/ld.so.8
81	man8/ldconfig	man8/ldconfig.8
81	man8/nscd	man8/nscd.8
81	man8/sln	man8/sln.8
81	man8/tzselect	man8/tzselect.8
81	man8/zdump	man8/zdump.8
81	man8/zic	man8/zic.8
man8/intro.8
man8/iconvconfig.8
man8/ld.8
man8/ld-linux.8
man8/ld-linux.so.8
man8/ld-z.8
man8/ld.so.8
man8/ldconfig.8
man8/nscd.8
man8/sln.8
man8/tzselect.8
man8/zdump.8
man8/zic.8


There are no suffixes in the second field.

> and also treat "." as space?

I'had been thinking about it, but didn't make an opinion.
Since they are rare, I think making them stand out a little bit
by having a special order rather than just being mixed with the
underscores would make sense.  But I'm open to change that.

> Where would you expect to see ld.so?

Not sure.

> 
> Also, in `sed`, instead of cloning the line, at the start of a series of 
> executions, make them all into a single inline command script, start with `h` to 
> *hold* the input line, and end with `G` instead of `N` to append '\n' then the 
> held line, convert to `\t`, drop the braces, and you can skip the then redundant 
> tests, something like the following should get you close (tried it earlier, now 
> sadly already gone from history):
> 
> | sed -E '
> 	h
> 	/\/intro./  s/.*\.([[:digit:]])/\10\t&/
> 	s/\.[[:digit:]]([[:alpha:]][[:alnum:]]*)?\>.*//
> 	s/\/[_-]*/\//g
> 	s/[_-]/_/g
> 	s/[_-]/ /g
> 	G
> 	s/\n/\t/
> 	' \
> | ...

I prefer having many one-liners for a few reasons:

-  Not everybody knows what h and G do.  I did't.  And I will
   soon forget.  In contrast, my implementation has nothing
   rare in it.

-  I can inspect the contents at each of the steps easily by
   adding a line with `| tee /dev/tty \`, for debug purposes.

In general, I avoid having large scripts in other languages.
I prefer piping many one-liners, even if it might be less
efficient (but it uses more cores, so it might end up being
faster; I've seen such things happen already many times).

Cheers,
Alex 

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

Attachment:
OpenPGP_signature

Description: OpenPGP digital signature