Pipes (fifos) not working in concurrently

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



Hello

I have a large list of URLs (from a database, generated automatically during tests) that I want to download using several wget processes at the same time. With our internal web servers, this will be a lot faster than downloading the pages one at a time with a single process.

So I create 20 pipes in my script with `mkfifo´ and connect the read end of each one to a new wget process for that fifo. The write end of each pipe is then connected to my script, with shell commands like `exec 18>>fifo_file_name´

Then my script outputs, in a loop, one line with an URL to each of the pipes, in turn, and then starts over again with the first pipe until there are no more URLs from the database client.

Much to my dismay I find that there is no concurrent / parallel download with the child `wget´ processes, and that for some strange reason only one wget process can download pages at a time, and after that process completes, another one can begin.

My script does feed *all* the pipes with data, one line to each pipe in turn, and has all the pipes written and closed by the time the first child process has even finished downloading.

Do you know why my child processes manifest this behavior of waiting in turn for each other in order to start reading the fifo and download ?

I figure it must be something about the pipes, because if I use regular files instead (and reverse the order: first write the URLs, then start wget to read them) than the child processes run in parallel as expected. The child processes also run in parallel if I open the write end of the pipes first, and the start the wget processes for the read end.

They even run in parallel with my pipes, but I could see them run like this only for once in all my attempts. I do not know what was special about that attempt, it happened at the beginning of the day, and the computers where not restarted nor logged off over night.

The pipes are created and deleted on ever run, with mkfifo and rm.

Is there something special about fifos to make them run in sequence if I open the read end first ?

My script is attached here, I believe it is nicely formatted and clear enough.

Thank you,
Timothy Madden
#!/bin/sh

set -e -x -v
set -o pipefail || true

# set some local defaults
web_server="${web_server:-'appserver'}"
db_server="${db_server:-'replication1'}"
database="${database:-'xe150'}"
cubrid="${cubrid:-'false'}"

# parse command line
case "$#" in
(0)
    echo Syntax:
    echo "	$0"	"webserver[:port] [ dbserver[:port] [db] ]"
    echo "	$0"	"webserver[:port] db@dbserver[:port]"
    echo
    exit 1;;
(1)
    if test "$1" = "--help" -o "$1" == "--usage"
    then
	"$0"
	exit
    else
	web_server="$1"
    fi;;
(2)
    web_server="$1"

    if echo "$2" | grep '@' -l>/dev/null
    then
	cubrid=true
	db_server="$2"
	database="$(echo "$2" | sed 's#^\([a-zA-Z_0-9]*\)@.*$#\1#')"
    else
	db_server="$2"
    fi;;
(3)
    web_server="$1"
    db_server="$2"
    database="$3";;
(*)
    "$0"
    exit 1;;
esac

if echo "$db_server" | grep ':' -l>/dev/null
then
    db_port="$(echo "$db_server" | sed 's#^[a-zA-Z0-9.]*:\([0-9]\{2,5\}\)$#\1#')"
    db_server="$(echo "$db_server" | sed 's#^\([a-zA-Z0-9.]*\):[0-9]\{2,5\}$#\1#')"
else
    db_port=3306    # default mysql port number
fi

# generate a set of pipes/processes
sets=
a=0
b=0

while [ $a -lt 2 ]  # 20 processes (and pipes)
do
    while [ $b -lt 10 ]
    do
	sets="${sets}${sets:+ }$a$b"
	b=$(($b+1))
    done

    b=0
    a=$(($a+1))
done


(
    # remove all pipes on exit
    trap 'rm -rf "/tmp/visit_boards"' EXIT

    mkdir "/tmp/visit_boards"

    pids_list=

    # create pipes
    # start wget processes for the read ends of the pipes
    # open shell output file descriptors for the write ends of the pipes
    for p in $sets
    do
	fdp="${p#0}"    # convert 08 to 8
	fdp=$(($fdp+5))    # first 3 file descriptiors are reserved, so start with the 4th
	fdp_bit=$(($fdp % 2))

	# create pipe
	mkfifo "/tmp/visit_boards/pipe_$p"

	# connect a new wget process to the read end of the pipe
	wget --server-response --spider --output-file="visit_$p.log" --input-file="/tmp/visit_boards/pipe_$p" &
	pids_list="$pids_list${pids_list:+ }$!"

	# connect the current shell process to the write end of the pipe
	echo exec "$fdp>>/tmp/visit_boards/pipe_$p">"/tmp/visit_boards/shell_script_pipe"
	. "/tmp/visit_boards/shell_script_pipe"
    done

    # output URLs to the pipes, one line into each pipe in turn
    if $cubrid
    then
	csql -u dba -p arniarules --command='SELECT "domain" FROM xe_sites WHERE site_srl != 0' "$db_server" |
		sed -n 's#^[[:space:]]*'\''\([a-zA-Z0-9_]*\)'\''$#\1#p'
    else
	echo "SELECT domain FROM xe_sites WHERE site_srl != 0" |
	    mysql -u dba --password=arniarules -h "$db_server" -P "$db_port" --skip-column-names "$database"
    fi \
	|
    {
	sh_flags="$-"
	set +x
	a=0
	b=0

	while read -r board_name
	do
	    # output the current site URL to the current file descriptor
	    # (connected to one of the pipes)
	    p="$a$b"
	    fdp="${p#0}"
	    fdp=$(($fdp+5))
	    fdp_bit=$(($fdp%2))

	    echo "http://$web_server/$database/$board_name"; >&$fdp

	    # advance (a, b) to the next pipe or cycle back to the
	    # first one (0, 0) if (2, 0) is reached
	    b=$(($b+1))
	    if [ $b -eq 10 ]
	    then
		b=0
		a=$(($a+1))
		if [ $a -eq 2 ]
		then
		    a=0
		fi
	    fi
	done

	set "-$sh_flags"
    }

    if [ "$?" -ne "0" ]
    then
	# Old versions of bash (still current on CentOS) do not exit after
	# errors from the nested (inner) pipe, even with set -e -o pipefail,
	# but "$?" still indicates a non-zero status
	exit 4;
    fi

    # list the open file descriptors for the process
    for OPEN_FD in /dev/fd/*
    do
	open_fds="${open_fds}${open_fds:+ }${OPEN_FD#/dev/fd/}"
    done
    echo "$open_fds"

    # close write end for all pipes
    a=0
    b=0
    while [ $a -lt 2 ]
    do
	while [ $b -lt 10 ]
	do
	    p="$a$b"
	    fdp="${p#0}"
	    fdp=$(($fdp+5))

	    # a redirection like 9>&- is defined to close file descriptor 9
	    echo exec "$fdp>&-" >"/tmp/visit_boards/shell_script_pipe"
	    . "/tmp/visit_boards/shell_script_pipe"

	    b=$(($b+1))
	done
	
	b=0
	a=$(($a+1))
    done

    # list the open file descriptors for the process
    open_fds=
    for OPEN_FD in /dev/fd/*
    do
	open_fds="${open_fds}${open_fds:+ }${OPEN_FD#/dev/fd/}"
    done
    echo "$open_fds"

    watch_pid=
    if [ -t 0 -a -t 1 ]
    then
	watch -n 1 ls -l visit_[0-1][0-9].log &
	watch_pid="$!"
    fi

    error=
    for pid in $pids_list
    do
	wait $pid

	if [ "$?" -ne 0 ]
	then
	    error="$error${error:+ }$pid"
	fi
    done

    if test -n "$watch_pid"
    then
	# `watch` utility will run until interrupted
	kill -s INT "$watch_pid"
    fi

    if test -n "$error"
    then
	echo Child processes "($error)" returned errors.
	exit 2
    fi


    # EXIT trap shall now remove pipes
)

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos

[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux