Re: off topic: combined output of concurrent processes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15Apr2012 14:30, Amadeus W.M. <amadeus84@xxxxxxxxxxx> wrote:
| > Look at this (completely untested) loop:
| > 
| >   # a little setup
| >   cmd=`basename "$0"`
| >   : ${TMPDIR:=/tmp}
| >   tmppfx=$TMPDIR/$cmd.$$
| > 
| >   i=0
| >   while read -r url
| >   do
| >     i=$((i+1))
| >     out=$tmppfx.$i
| >     if curl -s "$url" >"$out"
| >     then  echo "$out"
| >     else  echo "$cmd: curl fails on: $url" >&2 fi &
| >   done < myURLs \
| >   | while read -r out
| >     do
| >       cat "$out"
| >       rm "$out"
| >     done \
| >   | tee all-data.out \
| >   | your-data-parsing-program
| 
| 
| I understand the script, although I haven't tested it either. My take on 
| it: 
| 	+ it solves the problem of curls overwriting (I think) 
| 	+ the data parsing and tracking is done on the combined curls

Yes.

| 	- it retrieves the urls serially, not in parallel

No, in parallel. There is an "&" after the "fi" in the if.

It looks like the "fi &" got sucked onto the end of an echo statemnet.
It should be on its own line.

| 	- it writes them to disk

Just long enough to be read and catted, then removed.

| 	- it re-reads them from disk, hence some disk activity, although 
| probably insignificant relative to the download time.

Should be, yes.

| The way I'm doing it now is this: I do the retrieval and the parsing and 
| tracking all within a single program. For each url I create a separate 
| thread from which I call curl and get its output, then parse.
| Like this:
| 
| // inside each thread:
[... popen(curl...) ...]
| // when threads done, analyze the combined info.
| 
| This works, but I would have liked a more modular solution. I want the url 
| retrieval to be a separate, standalone entity and the parsing and 
| tracking another entity (possibly two entities). Hence, what I want is 
| 
| - in a shell
| 	- download in parallel 
| 	- merge curl outputs

My above loop tries to do that. The curls do run in parallel.

| then pipe into the parser/tracker. Parsing can be done per url, but 
| tracking MUST be across urls. 

That should work; your parser comes at the end of the pipeline.

Cheers,
-- 
Cameron Simpson <cs@xxxxxxxxxx> DoD#743
http://www.cskk.ezoshosting.com/cs/

[Alain] had been looking at his dashboard, and had not seen me, so I
ran into him. - Jean Alesi on his qualifying prang at Imola '93
-- 
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org


[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux