On 15Apr2012 05:52, Amadeus W.M. <amadeus84@xxxxxxxxxxx> wrote: | With this exact script, it works for FOO (probably because it's short). | For FOOOOOOOOOO...(1000 Os) I see again fewer than 100 lines in "zot". | This, if I iterate 100 times. If I iterate, say, 10-20 times only, I seem | to get all the lines. Can it have something to do with the number of jobs | executed in the background? | | The real code is like this: | | #!/bin/bash | for url in $(cat myURLs) | do | curl -s $url & | done [ Potential solution at bottom of post. ] Ok, another question: is curl using ONLY the -s (silent) option? And specificly, it is NOT using -C (continue)? Curl can behave differently if its output is not a terminal. Aside from turning off the progress if the output _is_ a terminal, with -C it uses the output file to figure out how much data to request; if it thinks the file is already patially fetched it won't fetch the front part. Finally, it is conceivable that curl might seek() the file. The reason I suggest this is that you said using a >> made it good. Normally (with a single output file, opened just the once) they would behave the same. But suppose curl, internally, seek()ing to a particular position. With a file opened for append that does nothing (the next write will go at the end of the file anyway) but if not then the seek would reposition the file pointer and overwrites would occur. Curl's got no good reason to do that (even with a -C option), but it might; if we suspect this some tests using the strace command can tell us. However, we can work around this whole issue and solve two problems: - the sharing of th output file, which we _suspect_ may be triggering bad behaviour from curl - the possible interleaving of curl outputs: curl _will_ get data from the URL in chunks, and parallel curls will interleave their output chunks Look at this (completely untested) loop: # a little setup cmd=`basename "$0"` : ${TMPDIR:=/tmp} tmppfx=$TMPDIR/$cmd.$$ i=0 while read -r url do i=$((i+1)) out=$tmppfx.$i if curl -s "$url" >"$out" then echo "$out" else echo "$cmd: curl fails on: $url" >&2 fi & done < myURLs \ | while read -r out do cat "$out" rm "$out" done \ | tee all-data.out \ | your-data-parsing-program This program does a few things: - gives each curl its own output file, avoiding our issues - runs them all in parallel, achieving your aim - never interleaves one curl with another; the second loop reads each completed output file in turn, completely - takes a copy using tee to the file all-data.out, just so you can inspect it - uses the "run until EOF" approach, avoiding tricky games with "wait" - writes output filenames using echo to the pipe in parallel; the echoes _should_ all do single writes into the pipe to the second loop, and never interfere with each other in consequence Shortcomings: if you hav too many URLs you will run out of processes (or available connections at the target web server) by running too many curls at once, because it will read URLs and fork/exec curls as fast as it can. If this solves your problems in a fashion pleasing to your mind, we can move to a more advanced token based loop to keep a maximum number of curls in play at any one time. How's this do you? Cheers, -- Cameron Simpson <cs@xxxxxxxxxx> DoD#743 http://www.cskk.ezoshosting.com/cs/ You can't have everything... where would you put it? - Charles Robinson, cr0100@xxxxxxxxxxxxx -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org