[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Original source file name?



On Fri, 01 Jun 2001 17:43:47 -0700 
Earl Hood <ehood@hydra.acs.uci.edu> wrote:

> On May 31, 2001 at 20:54, J C Lawrence wrote:
>> Good point.  As the ultimate goal is to shove the entire message
>> base into an SQL DB (I've got users begging for things like
>> thread-bounded searches and the ability to gen meta views of an
>> archive), I'll probably head that way.
>> 
>> While its a gruesome hack, I'm ultimately looking to use MHonArc
>> as a front end processor which writes scripts as output which are
>> then executed to input the message and all its particulars inputs
>> into an SQL DB.  What I haven't figured out yet is how to
>> properly extract the thread linkings for input into the DB, as
>> well as how to effectively (ie scalably) provide the thread
>> database to MHonArc when archiving a message (we're talking
>> hundreds of thousands of messages, possibly small order
>> millions).

> Its on my TODO list to allow callback hooks during MHonArc
> processing.  The problem is that to allow a decent callback API,
> some of the internal functions need changing.  Something for
> probably a 2.5 release (whenever that is).

> With a hook, you can store the message-ids and
> references/in-reply-to data in a DB, and then compute the threads
> from that.  This is basically what MHonArc does.

<nod>

How does MHonArc currently attempt to thread messages which are
missing In-Reply-To/References headers, but which share date and
subject strings with an extant thread?  

>> At that point my main interests in MHonArc are its excellant MIME
>> and charset handling (damned fine job BTW).  I'd like to also use
>> it to build the thread graph rather than dynamically building it
>> off the References/In-Reply-To headers dynamically as MHonArc
>> properly handles the matching-subject thread hits.

> With the current code base, you can access the thread listing
> order.

> There are multiple approaches, but one is creating a custom
> mhonarc that does a dump of thread data after an archive update in
> some format you need.  Two main variables are created when
> generating the thread data: @TListOrder and %Index2TLoc.  The
> first is a list of message indexes in the order to be rendered on
> a thread index page.  The second is a hash that maps a message
> index the ordinal thread index position (useful in resource
> variable resolution).

> Also generated is the %ThreadLevel hash.  This maps a message
> index to the thread depth of the message.  A depth of 0 means it
> is a root-level message.  Therefore, with @TListOrder and
> %ThreadLevel one can infer the thread tree structure.

Yup.  My problem is that both of these data sets suffer badly when
they get large (eg ~500K - 1Million messages).  Methinks I'll have
to shove all the thread data into the external DB and then attempt
to to MHonArc either build deltas against it (eg the dump you
mentioned above), have MHonArc do the heavy lift in determinging
insertion points for unthreaded messages which look like thread
members and do everything else at HTML page generation time.

> These structures are a sequential way of representing message
> threads, but is conduscive to generating the HTML thread index
> pages since that is done in a sequential manner.  Also, in Perl 4
> days, doing complex tree structures was a non-trivial task.

I believe.  I've been dumbstruck to find that PHP does not have an
ordered collection type which allows insertions (eg a list, vetor,
etc).  The've got associative arrays and objects as their derived
types, and then a fairly simple set of base types.

<sigh>

Building one is going to be un-fun.  Best I can think of so far is
doing evil work with associative array key generation, but that
*really* is evil.

> BTW, the following is a snippet from mhinit.pl:

> Unfortunately, my memory needs refreshing on all the threading
> stuff, so I'm probably forgetting something.  The multi-page index
> support does complicate some of the stuff (hence the
> visible/non-visible comments).

Yeah, I'll have to dig into this.  Thanks.

-- 
J C Lawrence                                       claw@kanga.nu
---------(*)                          http://www.kanga.nu/~claw/
The pressure to survive and rhetoric may make strange bedfellows


[Index of Archives]     [Bugtraq]     [Yosemite News]     [Mhonarc Home]