Hi Brian, On 2023-08-28 20:24, Brian Inglis wrote: > On 2023-08-28 06:17, Alejandro Colomar wrote: >> Hi Brian, >> >> On 2023-08-22 01:45, Brian Inglis wrote: >>> I am in favour of all punctuation being treated as word spaces and sorting >>> "cat ..." before "cat..." but find the real orders more evocative and easier to >>> decide about than examples. >> >> Here's an excerpt of how treating - and _ as spaces looks like. I think >> it's a reasonable order. Should I apply that diff? >> >> Cheers, >> Alex >> >> $ git diff >> diff --git a/scripts/sortman b/scripts/sortman >> index a8f70bab5..6d1d92f09 100755 >> --- a/scripts/sortman >> +++ b/scripts/sortman >> @@ -9,7 +9,7 @@ sed -E '/\/intro./ s/.*\.([[:digit:]])/\10\t&/' \ >> | sed -E ' s/\t(.*)/&\n\1/' \ >> | sed -E '/\t/ s/\.[[:digit:]]([[:alpha:]][[:alnum:]]*)?\>.*//' \ >> | sed -E '/\t/ s/\/[_-]*/\//g' \ >> -| sed -E '/\t/ s/[_-]/_/g' \ >> +| sed -E '/\t/ s/[_-]/ /g' \ >> | sed -E '/\t/ {N;s/\n/\t/;}' \ >> | sort -fV -k1,2 \ >> | cut -f3; >> $ touch man8/ld-z.8 >> $ touch man8/ld.8 >> $ find man8 | ./scripts/sortman >> man8/intro.8 >> man8/iconvconfig.8 >> man8/ld.8 >> man8/ld-linux.8 >> man8/ld-linux.so.8 >> man8/ld-z.8 >> man8/ld.so.8 >> man8/ldconfig.8 >> man8/nscd.8 >> man8/sln.8 >> man8/tzselect.8 >> man8/zdump.8 >> man8/zic.8 >> man8 > > Looks better, Thanks, I've applied and pushed the patch. > but should your sort *key* field instance also drop the section > suffix (already in prefix) It is already dropped. Am I understanding it correctly? Here's a debug patch to view the sort key field: diff --git a/scripts/sortman b/scripts/sortman index 6d1d92f09..e690f23ea 100755 --- a/scripts/sortman +++ b/scripts/sortman @@ -12,4 +12,5 @@ sed -E '/\/intro./ s/.*\.([[:digit:]])/\10\t&/' \ | sed -E '/\t/ s/[_-]/ /g' \ | sed -E '/\t/ {N;s/\n/\t/;}' \ | sort -fV -k1,2 \ +| tee /dev/tty \ | cut -f3; And here's how it looks with man8 (plus the dummy files): $ find man8 -type f | ./scripts/sortman 80 man8/intro man8/intro.8 81 man8/iconvconfig man8/iconvconfig.8 81 man8/ld man8/ld.8 81 man8/ld linux man8/ld-linux.8 81 man8/ld linux.so man8/ld-linux.so.8 81 man8/ld z man8/ld-z.8 81 man8/ld.so man8/ld.so.8 81 man8/ldconfig man8/ldconfig.8 81 man8/nscd man8/nscd.8 81 man8/sln man8/sln.8 81 man8/tzselect man8/tzselect.8 81 man8/zdump man8/zdump.8 81 man8/zic man8/zic.8 man8/intro.8 man8/iconvconfig.8 man8/ld.8 man8/ld-linux.8 man8/ld-linux.so.8 man8/ld-z.8 man8/ld.so.8 man8/ldconfig.8 man8/nscd.8 man8/sln.8 man8/tzselect.8 man8/zdump.8 man8/zic.8 There are no suffixes in the second field. > and also treat "." as space? I'had been thinking about it, but didn't make an opinion. Since they are rare, I think making them stand out a little bit by having a special order rather than just being mixed with the underscores would make sense. But I'm open to change that. > Where would you expect to see ld.so? Not sure. > > Also, in `sed`, instead of cloning the line, at the start of a series of > executions, make them all into a single inline command script, start with `h` to > *hold* the input line, and end with `G` instead of `N` to append '\n' then the > held line, convert to `\t`, drop the braces, and you can skip the then redundant > tests, something like the following should get you close (tried it earlier, now > sadly already gone from history): > > | sed -E ' > h > /\/intro./ s/.*\.([[:digit:]])/\10\t&/ > s/\.[[:digit:]]([[:alpha:]][[:alnum:]]*)?\>.*// > s/\/[_-]*/\//g > s/[_-]/_/g > s/[_-]/ /g > G > s/\n/\t/ > ' \ > | ... I prefer having many one-liners for a few reasons: - Not everybody knows what h and G do. I did't. And I will soon forget. In contrast, my implementation has nothing rare in it. - I can inspect the contents at each of the steps easily by adding a line with `| tee /dev/tty \`, for debug purposes. In general, I avoid having large scripts in other languages. I prefer piping many one-liners, even if it might be less efficient (but it uses more cores, so it might end up being faster; I've seen such things happen already many times). Cheers, Alex -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature