On 29 September 2011 23:34, Tommy Pham <tommyhp2@xxxxxxxxx> wrote: > On Thu, Sep 29, 2011 at 3:27 PM, Tommy Pham <tommyhp2@xxxxxxxxx> wrote: >> >> On Thu, Sep 29, 2011 at 9:09 AM, Richard Quadling <rquadling@xxxxxxxxx> >> wrote: >>> >>> Hi. >>> >>> I'm looking to process very large XML files without the need of first >>> downloading them. >>> >>> To that end, >>> SimpleXMLIterator('compress.zlib://http://www.site.com/products.xml.gz') >>> is working perfectly. >>> >>> But a downside is that I have no information of my progress. >>> >>> Is there any mechanism available to get a position within the XML stream? >>> >>> I can use libxml_set_streams_context() to set a context (so I can >>> provide POST data if needed for the HTTP request), but I can't see how >>> to gain access to the stream within the libxml code (SimpleXML uses >>> libxml). >>> >>> At the most basic, what I'm looking for is to be able to record a >>> percentage complete. Even with compression, I'll be reading some bytes >>> from a stream (either from http or from compress.zlib) and I want to >>> know where I am in that stream. >>> >>> The HTTP header will tell me how big the file is (so that can easily >>> be a HEAD request to get that data). >>> >>> >>> Even if I DO save the file locally first, I still can't get a position. >>> >>> If I use the SimpleXMLIterator::count() method, I am unsure as to what >>> will happen if I am using a stream (rather than a local file). If I >>> use ... >>> >>> $xml = new SimpleXMLIterator(...); >>> $items = $xml->count(); >>> foreach($xml as $s_Tag => $o_Item) { >>> ... >>> } >>> >>> will the XML file be cached somewhere? Or will that depend upon the >>> originating server supporting some sort of rewind/chunk mechanism? >>> >>> >>> >>> Any suggestions/ideas? >>> >>> >>> >>> Richard. >>> >>> >>> -- >>> Richard Quadling >>> Twitter : EE : Zend : PHPDoc >>> @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea >>> >> >> Richard, >> >> Only think I can think of is break up into 2 parts. 1st part use cURL for >> the down streams and monitor it's progress. 2nd part is XML parsing. If >> you want to do both simultaneously and only if the remote supports chunked >> encoding/transfer, you could use multiple handles of cURL into numerical >> indexed array variable containing the chunks for XML parsing. This could be >> memory intensive if the file is very large, depending whether you free the >> elements in the array as you progress. So if memory consumption is a major >> concern, you may want to save to the file locally until the import/migration >> is done as there maybe a situation where you'll need to review the data and, >> thus, saving the time and bandwidth of having to re-download the data. >> >> Regards, >> Tommy > > An afterthought, handling streams and parsing at the same may require > sophisticated XML node validations as the node maybe split between the > transferred chunks. > The SimpleXMLIterator() does do a superb job of providing 1 node at a time and doesn't remember each node automatically. If I do, then that's my issue. Accessing the stream/file meta data doesn't seem possible. -- Richard Quadling Twitter : EE : Zend : PHPDoc @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php