Re: Large XML manipulation within PHP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In that case you may want to try XMLReader as it doesn't load all XML into memory.

If that doesn't help that you will need to do custom parser application for you need. using XMLReader to read through whole XML chunking it with eg every 5000 items and storing those chunks on disk.

Than use SimpleXML to read and manipulate those chunks and save them back to disk.

It would help if you can provide with XML mockup
eg.
<feed>
 <item id='1'>
  .......
</item>
<item id='2'>
  .......
</item>
<item id='3'>
  .......
</item>
....
<item id='278172'>
</item>
</feed>

<?php



//this will makes files  xml-1.xml  xml-2.xml etc
makeChunksWithXmlReader($pathToLargeXmlFile, CustomXmlManipulator:: $SPLITAT);


class CustomXmlManipulator{
static $SPLITAT = 5000;


       function getXmlChunk($id){	
          return simplexml_load_file( $this-> getXmlFile($id) );
       }

      function storeXml($id,$simpleXmlObject){
         $file = $this-> getXmlFile($id);
         file_put_contents( $file , $simpleXmlObject->asXml() );
        //free up the memory
        $simpleXmlObject = null;
      }

     function getXmlFile($id){
     	 $chunk =  (int)($id / self::$SPLITAT)  + 1;
         return 'xml-' . $chunk .' .xml';
     }
}


$XMLM = new CustomXmlManipulator();
$first =  $XMLM-> getXmlChunk(1);

foreach ($first as $x){
   ....
.....
   if(something){
      //here you need to manipulate ID 23493
      $tmpX = $XMLM-> getXmlChunk(23493);
      $tmpX->....  = .....;  //change XML
     $XMLM->storeXml(23493, $tmpX);
    }
}

?>


this is just a basic logic it can be extender further more, depending on your needs. function makeChunksWithXmlReader needs to go through a XML file and make chunks on disk.
more on XMLReader http://www.php.net/manual/en/class.xmlreader.php





On Apr 23, 2008, at 10:41 PM, Steve Gula wrote:

I could but it would make things very difficult. Some of the entities around id # 100 could be affected by entities around id #11000 and would result in a file needing to be manipulated at the same time. Unfortunately, I don't
think this is a top to bottom change for the information at hand.

On Wed, Apr 23, 2008 at 4:36 PM, Bastien Koert <phpster@xxxxxxxxx> wrote:



On 4/23/08, Steve Gula <sg-lists@xxxxxxxxxxxxx> wrote:

I work for a company that has chosen to use XML (Software AG Tamino XML database) as its storage system for an enterprise application. We need
to
make a system wide change to information within the database that isn't feasible to do through our application's user interface. My solution was
to
unload the XML collection in question, open it, manipulate it, then
write it
back out. Problem is it's a 230+MB file and even with PHP's max mem set
to
4096MB (of 8GB available to the system) SimpleXML claims to still run
out of
memory. Can anyone recommend a better way for handling a large amount of
XML
data? Thanks.

--
--Steve Gula

(this email address is used for list communications only, direct contact
at
this email address is not guaranteed to be read)


Can you chunk the data in any way, break it into smaller more managable
peices?

--

Bastien

Cat, the other other white meat




--
--Steve Gula

(this email address is used for list communications only, direct contact at
this email address is not guaranteed to be read)

Bojan Tesanovic
http://www.carster.us/





[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux