Re: Question about apache-php concurrent process control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you Richard, I think I'd better explain a little about the project and then you or somebody else might give some good suggestion upon the restrictions of the project.


The project is to implement a digital library protocol, called oai_pmh (http://www.openarchives.org/OAI/openarchivesprotocol.html) It acts as a broker for the harvester client to get meta data from libraries that have Z39.50 server. The database resides on libraries, and vary alot in speed, number of records, way to accept connection from z39.50 client. The number of records from some libraries might be over million. So the part that getting data from those libraries behave very differently.

The harvester client sends http request, normally through program, like perl LWP. It normally sets 180 time out for connection.

According to the protocol, the oai_pmh data provider act on respond to harvester http request, it begin to connect to specific library's z39.50 server, get data in, write them to disk, and translate to another xml format, then send to harvester client. If the records are too many, oai_pmh should send back partial data with resumption token. The harvester can later send out http request with same url but with the resumption token as one of the POST variable to get further data records. This process can be continueing till all the records has been send.

Thus I noramlly use perl program to send the http request and get content instead of BROWSER. The behavior of buffer should not due to the setting of the browser.

I can not echo the metadata directly back, since xlst need to use to transform and new xml file(s) are writen. The header() redirction is very nature to use if it can close the connection before I do something very time consumming after that.

The exec with & and the cron job are hard to use, since connection to z39.50 with a lot of state variables like connection id, etc can not easily be passed to another script.

The harvester user normally is not a human with browser but a piece of code, while looping and sending out http requests if the page it gets back has <resumptionToken> tag. (it replace the element between open and close tag of <resumptionToken>, append to the next http request as POST variable for next records page). But the problem is each http request posts a timeout of 180 seconds.

Thus I have to return partical data within 3 minutes while the whole process might take hours or even days. Then the process continue to get data from library server and transform it, then write to disk in a particular directory. The next request with resumption token comes in, the program will check for the existing of the directory and return if yes. If not existing, program will check to return within 3 minutes or send back not available information.

Sorry for the long writting. I hope some one has some suggestion for me. Thank you very much.

-------------------------------------------------------------------------------

> I now encounter a problem with flow control of my program with PHP. This
> is
> very crucial to the design of a pretty big project. This is what I want to
> do in the program:
>
> <?php
> do_A();
> header("Location: ".$result_of_do_A);

Depending on the buffering options in php.ini and/or Apache, this may or
may not just end your program, as I understand it.

Once you send the Location header, everything else is irrelevant, or at
least not reliable.

You could do:
echo $result_of_do_A;
flush();

and the user will see what happened with A, while waiting for B.

> do_B();
> ?>
>
> Since it takes do_B() quite a while to finish, so I want the http client > get the partial result from do_A() by redirect a page to them before start
> do_B(). But it seems that the redirection will only occure after the
> entire
> php program finishes, i.e., after do_B(). I sent http request through
> browser, curl comman line with -N (no buffer) option and with a perl LWP
> program I wrote. All of them suggest that header(), although is put before > do_B() in code, gets executed only after all the php code finished. I add
> flush() after header() too, but no work.

If that is what you are seeing happen, you probably have output buffering
turned "on"

The Location: header is acted upon by the BROWSER, not by PHP, not by your
server.  The BROWSER sees that header and then jumps to somewhere else.

> My question is: Is there any way that I can return to the client though
> http
> response and then continue my progress with my program?

You could also look into the pcntl stuff to fork() or, depending on
various settings, you might get:
exec("do_B() &");

to get B to happen in the background.

With all that said:  As a general rule, when I found myself doing this
kind of stuff, I later realized that I hadn't really designed my
application very well for an end-user experience.

If it takes THAT long to finish B, then you're probably MUCH better off
putting something in a "ToDo List" in your database, and doing B "later"
in a cron job.

Then notify the user through email or some kind of status display that
they will see on your site frequently when "B is done"

NEVER make the user sit around waiting for your server.  Human time is far
far far too precious (and expensive!) to waste it sitting around doing
nothing useful waiting for your program to finish.

--
Like Music?
http://l-i-e.com/artists.htm



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux