On 26 Sep 2011, at 17:24, Richard Quadling wrote: > I've got a project which will be needing to iterate some very large > XML files (around 250 files ranging in size from around 50MB to > several hundred MB - 2 of them are in excess of 500MB). > > The XML files have a root node and then a collection of products. In > total, in all the files, there are going to be several million product > details. Each XML feed will have a different structure as it relates > to a different source of data. > > I plan to have an abstract reader class with the concrete classes > being extensions of this, each covering the specifics of the format > being received and has the ability to return a standardised view of > the data for importing into mysql and eventually MongoDB. > > I want to use an XML iterator so that I can say something along the lines of ... > > 1 - Instantiate the XML iterator with the XML's URL. > 2 - Iterate the XML getting back one node at a time without keeping > all the nodes in memory. > > e.g. > > <?php > $o_XML = new SomeExtendedXMLReader('http://www.site.com/data.xml'); > foreach($o_XML as $o_Product) { > // Process product. > } > > > Add to this that some of the xml feeds come .gz, I want to be able to > stream the XML out of the .gz file without having to extract the > entire file first. > > I've not got access to the XML feeds yet (they are coming from the > various affiliate networks around, and I'm a remote user so need to > get credentials and the like). > > If you have any pointers on the capabilities of the various XML reader > classes, based upon this scenario, then I'd be very grateful. > > > In this instance, the memory limitation is important. The current code > is string based and whilst it works, you can imagine the complexity of > it. > > The structure of each product internally will be different, but I will > be happy to get back a nested array or an XML fragment, as long as the > iterator is only holding onto 1 array/fragment at a time and not > caching the massive number of products per file. As far as I'm aware, XML Parser can handle all of this. http://php.net/xml It's a SAX parser so you can feed it the data chunk by chunk. You can use gzopen to open gzipped files and manually feed the data into xml_parse. Be sure to read the docs carefully because there's a lot to be aware of when parsing an XML document in pieces. -Stuart -- Stuart Dallas 3ft9 Ltd http://3ft9.com/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php