Re: Question about apache-php concurrent process control

"Liang ZHONG" <little00wofl@xxxxxxxxxxx> · Fri, 22 Jul 2005 02:59:35 -0400

Thank you Richard, I think I'd better explain a little about the project and 
then you or somebody else might give some good suggestion upon the 
restrictions of the project.

The project is to implement a digital library protocol, called oai_pmh 
(http://www.openarchives.org/OAI/openarchivesprotocol.html)
It acts as a broker for the harvester client to get meta data from libraries 
that have Z39.50 server. The database resides on libraries, and vary alot in 
speed, number of records, way to accept connection from z39.50 client. The 
number of records from some libraries might be over million. So the part 
that getting data from those libraries behave very differently.

The harvester client sends http request, normally through program, like perl 
LWP. It normally sets 180 time out for connection.

According to the protocol, the oai_pmh data provider act on respond to 
harvester http request, it begin to connect to specific library's z39.50 
server, get data in, write them to disk, and translate to another xml 
format, then send to harvester client. If the records are too many, oai_pmh 
should send back partial data with resumption token. The harvester can later 
send out http request with same url but with the resumption token as one of 
the POST variable to get further data records. This process can be 
continueing till all the records has been send.

Thus I noramlly use perl program to send the http request and get content 
instead of BROWSER. The behavior of buffer should not due to the setting of 
the browser.

I can not echo the metadata directly back, since xlst need to use to 
transform and new xml file(s) are writen. The header() redirction is very 
nature to use if it can close the connection before I do something very time 
consumming after that.

The exec with & and the cron job are hard to use, since connection to z39.50 
with a lot of state variables like connection id, etc can not easily be 
passed to another script.

The harvester user normally is not a human with browser but a piece of code, 
while looping and sending out http requests if the page it gets back has 
<resumptionToken> tag. (it replace the element between open and close tag of 
<resumptionToken>, append to the next http request as POST variable for next 
records page). But the problem is each http request posts a timeout of 180 
seconds.

Thus I have to return partical data within 3 minutes while the whole process 
might take hours or even days. Then the process continue to get data from 
library server and transform it, then write to disk in a particular 
directory. The next request with resumption token comes in, the program will 
check for the existing of the directory and return if yes. If not existing, 
program will check to return within 3 minutes or send back not available 
information.

Sorry for the long writting. I hope some one has some suggestion for me. 
Thank you very much.

-------------------------------------------------------------------------------

> I now encounter a problem with flow control of my program with PHP. This
> is
> very crucial to the design of a pretty big project. This is what I want 
to
> do in the program:
>
> <?php
> do_A();
> header("Location: ".$result_of_do_A);

Depending on the buffering options in php.ini and/or Apache, this may or
may not just end your program, as I understand it.

Once you send the Location header, everything else is irrelevant, or at
least not reliable.

You could do:
echo $result_of_do_A;
flush();

and the user will see what happened with A, while waiting for B.

> do_B();
> ?>
>
> Since it takes do_B()  quite a while to finish, so I want the http 
client
> get the partial result from do_A() by redirect a page to them before 
start
> do_B(). But it seems that the redirection will only occure after the
> entire
> php program finishes, i.e., after do_B(). I sent http request through
> browser, curl comman line with -N (no buffer) option and with a perl LWP
> program I wrote. All of them suggest that header(), although is put 
before
> do_B() in code, gets executed only after all the php code finished. I 
add
> flush() after header() too, but no work.

If that is what you are seeing happen, you probably have output buffering
turned "on"

The Location: header is acted upon by the BROWSER, not by PHP, not by your
server.  The BROWSER sees that header and then jumps to somewhere else.

> My question is: Is there any way that I can return to the client though
> http
> response and then continue my progress with my program?

You could also look into the pcntl stuff to fork() or, depending on
various settings, you might get:
exec("do_B() &");

to get B to happen in the background.

With all that said:  As a general rule, when I found myself doing this
kind of stuff, I later realized that I hadn't really designed my
application very well for an end-user experience.

If it takes THAT long to finish B, then you're probably MUCH better off
putting something in a "ToDo List" in your database, and doing B "later"
in a cron job.

Then notify the user through email or some kind of status display that
they will see on your site frequently when "B is done"

NEVER make the user sit around waiting for your server.  Human time is far
far far too precious (and expensive!) to waste it sitting around doing
nothing useful waiting for your program to finish.

--
Like Music?
http://l-i-e.com/artists.htm

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php