Hello
I have a large list of URLs (from a database, generated automatically
during tests) that I want to download using several wget processes at
the same time. With our internal web servers, this will be a lot faster
than downloading the pages one at a time with a single process.
So I create 20 pipes in my script with `mkfifo´ and connect the read end
of each one to a new wget process for that fifo. The write end of each
pipe is then connected to my script, with shell commands like `exec
18>>fifo_file_name´
Then my script outputs, in a loop, one line with an URL to each of the
pipes, in turn, and then starts over again with the first pipe until
there are no more URLs from the database client.
Much to my dismay I find that there is no concurrent / parallel download
with the child `wget´ processes, and that for some strange reason only
one wget process can download pages at a time, and after that process
completes, another one can begin.
My script does feed *all* the pipes with data, one line to each pipe in
turn, and has all the pipes written and closed by the time the first
child process has even finished downloading.
Do you know why my child processes manifest this behavior of waiting in
turn for each other in order to start reading the fifo and download ?
I figure it must be something about the pipes, because if I use regular
files instead (and reverse the order: first write the URLs, then start
wget to read them) than the child processes run in parallel as expected.
The child processes also run in parallel if I open the write end of the
pipes first, and the start the wget processes for the read end.
They even run in parallel with my pipes, but I could see them run like
this only for once in all my attempts. I do not know what was special
about that attempt, it happened at the beginning of the day, and the
computers where not restarted nor logged off over night.
The pipes are created and deleted on ever run, with mkfifo and rm.
Is there something special about fifos to make them run in sequence if I
open the read end first ?
My script is attached here, I believe it is nicely formatted and clear
enough.
Thank you,
Timothy Madden
#!/bin/sh
set -e -x -v
set -o pipefail || true
# set some local defaults
web_server="${web_server:-'appserver'}"
db_server="${db_server:-'replication1'}"
database="${database:-'xe150'}"
cubrid="${cubrid:-'false'}"
# parse command line
case "$#" in
(0)
echo Syntax:
echo " $0" "webserver[:port] [ dbserver[:port] [db] ]"
echo " $0" "webserver[:port] db@dbserver[:port]"
echo
exit 1;;
(1)
if test "$1" = "--help" -o "$1" == "--usage"
then
"$0"
exit
else
web_server="$1"
fi;;
(2)
web_server="$1"
if echo "$2" | grep '@' -l>/dev/null
then
cubrid=true
db_server="$2"
database="$(echo "$2" | sed 's#^\([a-zA-Z_0-9]*\)@.*$#\1#')"
else
db_server="$2"
fi;;
(3)
web_server="$1"
db_server="$2"
database="$3";;
(*)
"$0"
exit 1;;
esac
if echo "$db_server" | grep ':' -l>/dev/null
then
db_port="$(echo "$db_server" | sed 's#^[a-zA-Z0-9.]*:\([0-9]\{2,5\}\)$#\1#')"
db_server="$(echo "$db_server" | sed 's#^\([a-zA-Z0-9.]*\):[0-9]\{2,5\}$#\1#')"
else
db_port=3306 # default mysql port number
fi
# generate a set of pipes/processes
sets=
a=0
b=0
while [ $a -lt 2 ] # 20 processes (and pipes)
do
while [ $b -lt 10 ]
do
sets="${sets}${sets:+ }$a$b"
b=$(($b+1))
done
b=0
a=$(($a+1))
done
(
# remove all pipes on exit
trap 'rm -rf "/tmp/visit_boards"' EXIT
mkdir "/tmp/visit_boards"
pids_list=
# create pipes
# start wget processes for the read ends of the pipes
# open shell output file descriptors for the write ends of the pipes
for p in $sets
do
fdp="${p#0}" # convert 08 to 8
fdp=$(($fdp+5)) # first 3 file descriptiors are reserved, so start with the 4th
fdp_bit=$(($fdp % 2))
# create pipe
mkfifo "/tmp/visit_boards/pipe_$p"
# connect a new wget process to the read end of the pipe
wget --server-response --spider --output-file="visit_$p.log" --input-file="/tmp/visit_boards/pipe_$p" &
pids_list="$pids_list${pids_list:+ }$!"
# connect the current shell process to the write end of the pipe
echo exec "$fdp>>/tmp/visit_boards/pipe_$p">"/tmp/visit_boards/shell_script_pipe"
. "/tmp/visit_boards/shell_script_pipe"
done
# output URLs to the pipes, one line into each pipe in turn
if $cubrid
then
csql -u dba -p arniarules --command='SELECT "domain" FROM xe_sites WHERE site_srl != 0' "$db_server" |
sed -n 's#^[[:space:]]*'\''\([a-zA-Z0-9_]*\)'\''$#\1#p'
else
echo "SELECT domain FROM xe_sites WHERE site_srl != 0" |
mysql -u dba --password=arniarules -h "$db_server" -P "$db_port" --skip-column-names "$database"
fi \
|
{
sh_flags="$-"
set +x
a=0
b=0
while read -r board_name
do
# output the current site URL to the current file descriptor
# (connected to one of the pipes)
p="$a$b"
fdp="${p#0}"
fdp=$(($fdp+5))
fdp_bit=$(($fdp%2))
echo "http://$web_server/$database/$board_name" >&$fdp
# advance (a, b) to the next pipe or cycle back to the
# first one (0, 0) if (2, 0) is reached
b=$(($b+1))
if [ $b -eq 10 ]
then
b=0
a=$(($a+1))
if [ $a -eq 2 ]
then
a=0
fi
fi
done
set "-$sh_flags"
}
if [ "$?" -ne "0" ]
then
# Old versions of bash (still current on CentOS) do not exit after
# errors from the nested (inner) pipe, even with set -e -o pipefail,
# but "$?" still indicates a non-zero status
exit 4;
fi
# list the open file descriptors for the process
for OPEN_FD in /dev/fd/*
do
open_fds="${open_fds}${open_fds:+ }${OPEN_FD#/dev/fd/}"
done
echo "$open_fds"
# close write end for all pipes
a=0
b=0
while [ $a -lt 2 ]
do
while [ $b -lt 10 ]
do
p="$a$b"
fdp="${p#0}"
fdp=$(($fdp+5))
# a redirection like 9>&- is defined to close file descriptor 9
echo exec "$fdp>&-" >"/tmp/visit_boards/shell_script_pipe"
. "/tmp/visit_boards/shell_script_pipe"
b=$(($b+1))
done
b=0
a=$(($a+1))
done
# list the open file descriptors for the process
open_fds=
for OPEN_FD in /dev/fd/*
do
open_fds="${open_fds}${open_fds:+ }${OPEN_FD#/dev/fd/}"
done
echo "$open_fds"
watch_pid=
if [ -t 0 -a -t 1 ]
then
watch -n 1 ls -l visit_[0-1][0-9].log &
watch_pid="$!"
fi
error=
for pid in $pids_list
do
wait $pid
if [ "$?" -ne 0 ]
then
error="$error${error:+ }$pid"
fi
done
if test -n "$watch_pid"
then
# `watch` utility will run until interrupted
kill -s INT "$watch_pid"
fi
if test -n "$error"
then
echo Child processes "($error)" returned errors.
exit 2
fi
# EXIT trap shall now remove pipes
)
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos