On 15Apr2012 14:30, Amadeus W.M. <amadeus84@xxxxxxxxxxx> wrote: | > Look at this (completely untested) loop: | > | > # a little setup | > cmd=`basename "$0"` | > : ${TMPDIR:=/tmp} | > tmppfx=$TMPDIR/$cmd.$$ | > | > i=0 | > while read -r url | > do | > i=$((i+1)) | > out=$tmppfx.$i | > if curl -s "$url" >"$out" | > then echo "$out" | > else echo "$cmd: curl fails on: $url" >&2 fi & | > done < myURLs \ | > | while read -r out | > do | > cat "$out" | > rm "$out" | > done \ | > | tee all-data.out \ | > | your-data-parsing-program | | | I understand the script, although I haven't tested it either. My take on | it: | + it solves the problem of curls overwriting (I think) | + the data parsing and tracking is done on the combined curls Yes. | - it retrieves the urls serially, not in parallel No, in parallel. There is an "&" after the "fi" in the if. It looks like the "fi &" got sucked onto the end of an echo statemnet. It should be on its own line. | - it writes them to disk Just long enough to be read and catted, then removed. | - it re-reads them from disk, hence some disk activity, although | probably insignificant relative to the download time. Should be, yes. | The way I'm doing it now is this: I do the retrieval and the parsing and | tracking all within a single program. For each url I create a separate | thread from which I call curl and get its output, then parse. | Like this: | | // inside each thread: [... popen(curl...) ...] | // when threads done, analyze the combined info. | | This works, but I would have liked a more modular solution. I want the url | retrieval to be a separate, standalone entity and the parsing and | tracking another entity (possibly two entities). Hence, what I want is | | - in a shell | - download in parallel | - merge curl outputs My above loop tries to do that. The curls do run in parallel. | then pipe into the parser/tracker. Parsing can be done per url, but | tracking MUST be across urls. That should work; your parser comes at the end of the pipeline. Cheers, -- Cameron Simpson <cs@xxxxxxxxxx> DoD#743 http://www.cskk.ezoshosting.com/cs/ [Alain] had been looking at his dashboard, and had not seen me, so I ran into him. - Jean Alesi on his qualifying prang at Imola '93 -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org