Re: off topic: combined output of concurrent processes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> 
> Look at this (completely untested) loop:
> 
>   # a little setup
>   cmd=`basename "$0"`
>   : ${TMPDIR:=/tmp}
>   tmppfx=$TMPDIR/$cmd.$$
> 
>   i=0
>   while read -r url
>   do
>     i=$((i+1))
>     out=$tmppfx.$i
>     if curl -s "$url" >"$out"
>     then  echo "$out"
>     else  echo "$cmd: curl fails on: $url" >&2 fi &
>   done < myURLs \
>   | while read -r out
>     do
>       cat "$out"
>       rm "$out"
>     done \
>   | tee all-data.out \
>   | your-data-parsing-program


I understand the script, although I haven't tested it either. My take on 
it: 
	+ it solves the problem of curls overwriting (I think) 
	+ the data parsing and tracking is done on the combined curls
	- it retrieves the urls serially, not in parallel
	- it writes them to disk
	- it re-reads them from disk, hence some disk activity, although 
probably insignificant relative to the download time.


The way I'm doing it now is this: I do the retrieval and the parsing and 
tracking all within a single program. For each url I create a separate 
thread from which I call curl and get its output, then parse. Like this:

// inside each thread:

const size_t bufSize_ = 1<<20;   // 1Mb, sufficiently large 
char buf_[bufSize_];             // local to each thread

// form the curl command = "curl -s $url"

fd_=popen(curl_command, "r");   // omit error checking here
nRead_=fread(buf_, sizeof(char), bufSize_, fd_); 
pclose(fd_);

parse(buf_);                  // to a struct visible from all threads


// when threads done, analyze the combined info.


This works, but I would have liked a more modular solution. I want the url 
retrieval to be a separate, standalone entity and the parsing and 
tracking another entity (possibly two entities). Hence, what I want is 

- in a shell
	- download in parallel 
	- merge curl outputs

then pipe into the parser/tracker. Parsing can be done per url, but 
tracking MUST be across urls. 


-- 
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org


[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux