[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Resource files for XML



I've attached some resource files for producing XML output. They are
specifically intended to be be pulled into PHP using XML_Unserializer
with minimum options. An example of this in action can be seen at
<http://www.tree-care.info/uktc/archive>

There are two main files. xml-nested.mrc was my starting point. This was
eventually abandoned so is less well developed than xml.mrc, but is
included for information.

The difference between the two is in how threads are represented, which
also affects the threadslices in messages. xml-nested produces xml with
a <list> element which contains <message> elements. Each message element
represents the top of a thread. A <message> element may contain a
<followups> element, which itself contains further <message> elements.

A couple of issues with the theory on the version of MhonArc I
originally tested this on (2.6.8):

Threadslices don't seem to flatten properly. Instead of closing a
<message> tag and opening another, a new <message> tag is opened inside
the old, and then all closed at the end. Because it's flattened, the
<followups> container is missed.

I couldn't get rid of a </followups><followups> output between 'proper'
threaded messages and subject threaded ones. Consequently a message with
'possible followups' ends up with two <followups> elements.

This structure can be parsed in PHP thus:

// Instantiate the serializer
$Unserializer = &new XML_Unserializer();

// Serialize the data structure
$status = $Unserializer->unserialize($doc, TRUE);

$data = $Unserializer->getUnserializedData();

$list = $data->list;

echo "<ul>\n";

foreach ($list as $msgobj) {
  parsemsgobj ($msgobj);
}

echo "</ul>\n";

###########################################
function parsemsgobj ($msgobj) {
  if ($msgobj->type == 'empty') {
    echo "<li style=\"list-style-type: none\">\n";
  } else {
    echo "<li>\n" . $msgobj->subject . ' (' . $msgobj->fromname . ")\n";
  }
  if (is_array($msgobj->followups)) {
    foreach ($msgobj->followups as $thisobj) {
      if (is_object($thisobj)) {
        echo "<ul>\n";
        parsemsgobj ($thisobj);
        echo "</ul>\n";
      }
      elseif (is_array($thisobj)) {
        foreach ($thisobj as $thisobj2) {
           if (is_object($thisobj2)) {
            echo "<ul>\n";
            parsemsgobj ($thisobj2);
            echo "</ul>\n";
          }
        }
      }
    }
  }
  echo "</li>\n";
}

The reason for abandoning this approach was whilst it can be parsed in
PHP, it is less easy to just chuck the data structure at a smarty
template and get that to do the work.

My second attempt creates a flat list of all messages in thread order.
Information on thread depth is included. The resulting data structure,
when pulled into PHP, can be parsed in smarty without too much difficulty.

#### PHP
<?php
// Instantiate the serializer
$Unserializer = &new XML_Unserializer();

// Serialize the data structure
$status = $Unserializer->unserialize($doc, TRUE);

$data = $Unserializer->getUnserializedData();

$msglist = $data->list;
$template = new template('mhonarcxmlmodule','_viewindex',$loc);
$template->assign('msglist',$msglist);

?>


#### Smarty
{assign var=lastDepth value=0}
{assign var=maxdepth value=4}
{assign var=start value=1}
<div id="mh_threadlist">

<ul>
{foreach from=$msglist item=msg}
  {if $msg->depth < $maxtdepth}
    {assign var=currentdepth value=$msg->depth}
  {else}
    {assign var=currentdepth value=$maxtdepth}
  {/if}

  {if $start==1}
    {assign var=start value=0}
    {if $currentdepth > 0}
      <li class="threadstartindent" >
      <span class="continued">{$msg->tsubject} continued</span><ul>
      {section name=foo loop=$currentdepth-1}
         <li class="threadstartindent" ><ul>
      {/section}
    {/if}
  {else}
    {if $currentdepth == $lastDepth}
      </li>
    {elseif $currentdepth > $lastDepth}
      <ul>
    {else}
      {section name=guff loop=$lastDepth-$currentdepth}
        </li></ul>
      {/section}
      </li>
    {/if}
  {/if}

  {if $msg->current == 'Yes'}
    <li><span class="urhere">{$msg->subject}</span><br />
    <span class="msgauthor">{$msg->fromname}</span>
  {else}
    <li><a href="blah" >{$msg->subject}</a><br />
    <span class="msgauthor">{$msg->fromname}</span>
  {/if}
  {assign var=lastDepth value=$currentdepth}
{/foreach}

{section name=guff2 loop=$lastDepth}</li></ul>{/section}
</li></ul>

</div>

There are some issues to iron out still. I've got problems with control
characters turning up in the XML, which I've tackled with processing in
the PHP app before trying to parse the XML. I've converted an archive of
around 43,000 pages to this and indexing it has thrown up 56 pages where
the XML fails to parser. A quick initial look suggests a significant
proportion of these involve messages which contain attached messages, so
I guess this is something to look at.

By the way, the reason for the

<ATTACHMENTURL>
%ATTACHMENTURLBASE%
</ATTACHMENTURL>

is to allow a str_replace() to be used to set the attachment url.


-- 
Chris Hastie
s|([\!\%\w\.\-+=/]+@)([\w\-]+\.)([\w\.\-])|$1.('x' x length($2)).'.'.$3|ge /var/www/html/archive/attachments %ATTACHMENTURLBASE% 75 date_1.xml date_ thread_1.xml thread_ xml_author.mrc xml_subject.mrc xml $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ $PAGENUM$ $NUMOFPAGES$ 99 $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ $TLEVEL$ $MSGTORDNUM$ $MSGNUM(TPARENT)$ $MSGNUM(TTOP)$ $PAGENUM$ $NUMOFPAGES$ $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ $TLEVEL$ $MSGTORDNUM$ $MSGNUM(TPARENT)$ $MSGNUM(TTOP)$ $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ $TLEVEL$ $MSGTORDNUM$ $MSGNUM(TPARENT)$ $MSGNUM(TTOP)$ -default-: ]]> subject from date content- errors-to forward lines message-id mime- nntp- originator path precedence received replied return-path status via x- list- delivered-to user-agent sender reply-to $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ $MSGNUM(NEXT)$ $MSGNUM(PREV)$ $MSGNUM(TNEXT)$ $MSGNUM(TPREV)$ ]]> $TSLICE$ application/*;[20x22]/icons/generic.gif application/msword;[20x22]/icons/word.gif application/postscript;[20x22]/icons/ps.gif application/rtf;[20x22]/icons/layout.gif application/x-csh;[20x22]/icons/script.gif application/x-dvi;[20x22]/icons/dvi.gif application/x-gtar;[20x22]/icons/zip.gif application/x-gzip;[20x22]/icons/zip.gif application/x-ksh;[20x22]/icons/script.gif application/x-latex;[20x22]/icons/tex.gif application/octet-stream;[20x22]/icons/binary.gif application/x-patch;[20x22]/icons/patch.gif application/pdf;[20x22]/icons/pdf.gif application/x-script;[20x22]/icons/script.gif application/x-sh;[20x22]/icons/script.gif application/x-tar;[20x22]/icons/tar.gif application/x-tex;[20x22]/icons/tex.gif application/x-zip-compressed;[20x22]/icons/zip.gif application/zip;[20x22]/icons/compressed.gif audio/*;[20x22]/icons/sound1.gif chemical/*;[20x22]/icons/sphere2.gif image/*;[20x22]/icons/image2.gif message/external-body;[20x22]/icons/link.gif multipart/*;[20x22]/icons/layout.gif text/*;[20x22]/icons/text.gif video/*;[20x22]/icons/movie.gif */*;[20x22]/icons/generic.gif m2h_external::filter; useicon inline m2h_text_plain::filter; maxwidth=78 fancyquote quoteclass=mh_bodyquote nolink=mailto $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ Yes $TLEVEL$ $MSGTORDNUM$ $MSGNUM(TPARENT)$ $MSGNUM(TTOP)$ $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ Yes $TLEVEL$ $MSGTORDNUM$ $MSGNUM(TPARENT)$ $MSGNUM(TTOP)$ $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ Yes $TLEVEL$ $MSGTORDNUM$ $MSGNUM(TPARENT)$ $MSGNUM(TTOP)$ 99 8:8:1
s|([\!\%\w\.\-+=/]+@)([\w\-]+\.)([\w\.\-])|$1.('x' x length($2)).'.'.$3|ge 75 date_1.xml date_ thread_1.xml thread_ xml $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ $PAGENUM$ $NUMOFPAGES$ $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ $PAGENUM$ $NUMOFPAGES$ $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ empty -default-: ]]> subject from date content- errors-to forward lines message-id mime- nntp- originator path precedence received replied return-path status via x- list- delivered-to user-agent sender reply-to $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ $MSGNUM(NEXT)$ $MSGNUM(PREV)$ $MSGNUM(TNEXT)$ $MSGNUM(TPREV)$ ]]> $TSLICE$ application/*;[20x22]/icons/generic.gif application/msword;[20x22]/icons/word.gif application/postscript;[20x22]/icons/ps.gif application/rtf;[20x22]/icons/layout.gif application/x-csh;[20x22]/icons/script.gif application/x-dvi;[20x22]/icons/dvi.gif application/x-gtar;[20x22]/icons/zip.gif application/x-gzip;[20x22]/icons/zip.gif application/x-ksh;[20x22]/icons/script.gif application/x-latex;[20x22]/icons/tex.gif application/octet-stream;[20x22]/icons/binary.gif application/x-patch;[20x22]/icons/patch.gif application/pdf;[20x22]/icons/pdf.gif application/x-script;[20x22]/icons/script.gif application/x-sh;[20x22]/icons/script.gif application/x-tar;[20x22]/icons/tar.gif application/x-tex;[20x22]/icons/tex.gif application/x-zip-compressed;[20x22]/icons/zip.gif application/zip;[20x22]/icons/compressed.gif audio/*;[20x22]/icons/sound1.gif chemical/*;[20x22]/icons/sphere2.gif image/*;[20x22]/icons/image2.gif message/external-body;[20x22]/icons/link.gif multipart/*;[20x22]/icons/layout.gif text/*;[20x22]/icons/text.gif video/*;[20x22]/icons/movie.gif */*;[20x22]/icons/generic.gif m2h_external::filter; useicon inline text/*; maxwidth=80 $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ Yes $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ Yes $MSGNUM$ $MSGGMTDATE(CUR;%a, %d %h %Y %H:%M:%S GMT)$ Yes 12 8:8:1
author_1.xml author_
subject_1.xml subject_
[Index of Archives]     [Bugtraq]     [Yosemite News]     [Mhonarc Home]