On Sun, 20 Oct 2019, Geert Uytterhoeven wrote:
Hi all, I'm working to add this list to lore.kernel.org.
That's great news because lore.kernel.org is a search engine that actually works.
As one of prerequisites they require that we provide full existing archives of all list messages (or, at least, as complete as possible). I've collected mine already, but would really appreciate if you could pitch in from your own collection. Just follow the instructions on this page: https://korg.wiki.kernel.org/userdoc/lore
For anyone else attempting this, note that linux-m68k has two addresses, so you need to pass two '-l' parameters: -l linux-m68k.vger.kernel.org linux-m68k.lists.linux-m68k.org The above wiki page neglects to mention that the 'list-archive-maker.py' script has serious problems. It can't deal with Alpine mboxes because they don't mangle "From" in message bodies as ">From". This leads to truncated messages. I strongly recommend that you enable the '-r' parameter and then examine all of the rejected messages. You'll also need to edit the script to avoid capturing rejected messages that they were rejected for obvious reasons (wrong list-id) rather than messed-up message boundary (i.e. a 'From ' mistakenly used as a message delimiter). Another problem with that script is that it captures too much. It will grab messages that appear to be cross-posted (based on To: or Cc:) even if those messages never reached linux-m68k. I suppose the idea is that capturing too much is better than too little? The script fabicates a missing List-ID header based on a guess. I don't know why it does this (bad idea from an archival perspective).
I uploaded the list of message-ids that I already have to http://users.telenet.be/geertu/linux-m68k-message-ids.tar.xz You'll need it during the archive sanitization process to pass to the -k switch. Please tar up and xz -9 the resulting directory with mbox files and send the archive to me so I can add it to what I already have. The archives I used, from my personal email collection, are: 1. linux-activists@xxxxxxxxxxxxxx 680x0 channel digest (May 1993 - March 1995) Used initially. Probably there was never a non-digest version? 2. linux-680x0@xxxxxxxxxxxxxxxx (Dec 1994 - Dec 1995) First real mailing list. Abandoned due to latency (most developers were located in Europe and 2 Mbps transatlantic sucked). 3. linux-m68k@xxxxxxxxxxxxxx (Oct 1995 - Oct 2004) Second mailing list. Abandoned due to spam and lack of admin activity. I did my best to remove spam. 4. linux-m68k@xxxxxxxxxxxxxxx (Oct 2004 - Current) Current mailing list. As this is a single logical mailing list, the plan is to combine all of it in a single archive. My archive should be fairly complete, except for network outages, and e.g. the Gandi email disaster week 2 years ago. And I don't have anything from the real early days, unfortunately.
I'll let you know if I find any missing messages here
Note that sanitization script choked on some mails from the old phil.uni-sb.de list, so it didn't succeed for me.
Was that the "From" bug? I am experimenting with pre-processing of mboxes to substitute the "From" lines in the message bodies. Not yet sure if this will be entirely successful... --
Thanks! Gr{oetje,eeting}s, Geert