Re: linux-m68k archival at lore.kernel.org

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 20 Oct 2019, Geert Uytterhoeven wrote:

Hi all,

I'm working to add this list to lore.kernel.org.

That's great news because lore.kernel.org is a search engine that actually 
works.

As one of prerequisites they require that we provide full existing 
archives of all list messages (or, at least, as complete as possible). 
I've collected mine already, but would really appreciate if you could 
pitch in from your own collection.

Just follow the instructions on this page:
https://korg.wiki.kernel.org/userdoc/lore


For anyone else attempting this, note that linux-m68k has two addresses, 
so you need to pass two '-l' parameters:
-l linux-m68k.vger.kernel.org linux-m68k.lists.linux-m68k.org

The above wiki page neglects to mention that the 'list-archive-maker.py' 
script has serious problems.

It can't deal with Alpine mboxes because they don't mangle "From" in 
message bodies as ">From". This leads to truncated messages.

I strongly recommend that you enable the '-r' parameter and then examine 
all of the rejected messages.

You'll also need to edit the script to avoid capturing rejected messages 
that they were rejected for obvious reasons (wrong list-id) rather than 
messed-up message boundary (i.e. a 'From ' mistakenly used as a message 
delimiter).

Another problem with that script is that it captures too much. It will 
grab messages that appear to be cross-posted (based on To: or Cc:) even if 
those messages never reached linux-m68k. I suppose the idea is that 
capturing too much is better than too little?

The script fabicates a missing List-ID header based on a guess. I don't 
know why it does this (bad idea from an archival perspective).

I uploaded the list of message-ids that I already have to
http://users.telenet.be/geertu/linux-m68k-message-ids.tar.xz
You'll need it during the archive sanitization process to pass to the -k switch.

Please tar up and xz -9 the resulting directory with mbox files and send
the archive to me so I can add it to what I already have.

The archives I used, from my personal email collection, are:
  1. linux-activists@xxxxxxxxxxxxxx 680x0 channel digest (May 1993 - March 1995)
     Used initially.  Probably there was never a non-digest version?
  2. linux-680x0@xxxxxxxxxxxxxxxx (Dec 1994 - Dec 1995)
     First real mailing list.  Abandoned due to latency (most developers were
     located in Europe and 2 Mbps transatlantic sucked).
  3. linux-m68k@xxxxxxxxxxxxxx (Oct 1995 - Oct 2004)
     Second mailing list. Abandoned due to spam and lack of admin activity.
     I did my best to remove spam.
  4. linux-m68k@xxxxxxxxxxxxxxx (Oct 2004 - Current)
     Current mailing list.
As this is a single logical mailing list, the plan is to combine all of 
it in a single archive.

My  archive should be fairly complete, except for network outages, and e.g.
the Gandi email disaster week 2 years ago.  And I don't have anything from
the real early days, unfortunately.


I'll let you know if I find any missing messages here

Note that sanitization script choked on some mails from the old 
phil.uni-sb.de list, so it didn't succeed for me.


Was that the "From" bug? I am experimenting with pre-processing of mboxes 
to substitute the "From" lines in the message bodies. Not yet sure if this 
will be entirely successful...

-- 

Thanks!

Gr{oetje,eeting}s,

                        Geert





[Index of Archives]     [Video for Linux]     [Yosemite News]     [Linux S/390]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux