Re: Original source file name?

Earl Hood <ehood@hydra.acs.uci.edu> · Fri, 01 Jun 2001 17:43:47 -0700

On May 31, 2001 at 20:54, J C Lawrence wrote:

> Good point.  As the ultimate goal is to shove the entire message
> base into an SQL DB (I've got users begging for things like
> thread-bounded searches and the ability to gen meta views of an
> archive), I'll probably head that way.
> 
> While its a gruesome hack, I'm ultimately looking to use MHonArc as
> a front end processor which writes scripts as output which are then
> executed to input the message and all its particulars inputs into an
> SQL DB.  What I haven't figured out yet is how to properly extract
> the thread linkings for input into the DB, as well as how to
> effectively (ie scalably) provide the thread database to MHonArc
> when archiving a message (we're talking hundreds of thousands of
> messages, possibly small order millions).

Its on my TODO list to allow callback hooks during MHonArc processing.
The problem is that to allow a decent callback API, some of the internal
functions need changing.  Something for probably a 2.5 release (whenever
that is).

With a hook, you can store the message-ids and references/in-reply-to
data in a DB, and then compute the threads from that.  This is
basically what MHonArc does.

> At that point my main interests in MHonArc are its excellant MIME
> and charset handling (damned fine job BTW).  I'd like to also use it
> to build the thread graph rather than dynamically building it off
> the References/In-Reply-To headers dynamically as MHonArc properly
> handles the matching-subject thread hits.

With the current code base, you can access the thread listing order.

There are multiple approaches, but one is creating a custom mhonarc
that does a dump of thread data after an archive update in some format
you need.  Two main variables are created when generating the thread
data: @TListOrder and %Index2TLoc.  The first is a list of message
indexes in the order to be rendered on a thread index page.  The
second is a hash that maps a message index the ordinal thread index
position (useful in resource variable resolution).

Also generated is the %ThreadLevel hash.  This maps a message index
to the thread depth of the message.  A depth of 0 means it is a
root-level message.  Therefore, with @TListOrder and %ThreadLevel one
can infer the thread tree structure.

These structures are a sequential way of representing message threads,
but is conduscive to generating the HTML thread index pages since
that is done in a sequential manner.  Also, in Perl 4 days, doing
complex tree structures was a non-trivial task.

BTW, the following is a snippet from mhinit.pl:

  ##      Following variables used in thread computation

  @ThreadList     = ();   # List of messages visible in thread index
  @NotIdxThreadList
		  = ();   # List of messages not visible in index
  %HasRef         = ();   # Flags if message has references (Keys = indexes)
			  #       (Values = reference message indexes)
  %HasRefDepth    = ();   # Depth of reference from HasRef value
  %Replies        = ();   # Msg-ids of explicit replies (Keys = indexes)
  %SReplies       = ();   # Msg-ids of subject-based replies (Keys = indexes)
  %TVisible       = ();   # Message visible in thread index (Keys = indexes)
  $DoMissingMsgs  =  0;   # Flag is missing messages should be noted in index

Unfortunately, my memory needs refreshing on all the threading stuff,
so I'm probably forgetting something.  The multi-page index support
does complicate some of the stuff (hence the visible/non-visible
comments).

--ewh