At the risk of making a complete and utter ass of myself, I'm going to disagree with Richard. I'm going to justify this, by the fact that file_get_content function is written in C, and performs function required, that is currently performed by wget. On 8/25/05, Michelle Konzack <linux4michelle@xxxxxxxxxx> wrote: > Hello, > > Curently I do it with wget and by hand using a bash script, > but like to integrate it into my php4 webinterface. > > What I need is: > > 1) INPUT-Form where I can type the URL of > a html/php (or something like this) page. I assume you know the html to create a web form, and how to use the $_GET and $_POST variables. If not, go learn php, and then read the rest of my reply. > > when submited, > > 2) the php script download the page and create an md5sum Assuming that allow-url-fopen is enabled you can $content = file_get_contents($url); $md5hash = md5($content); > 3) look in a database where it check the whole URL wheter > it is already there and if > YES check the md5sum What DB are you using? > 3.1) if equal drop the URL and stop here > 3.2) if different calculate original md5sum > and insert it into database > NO calculate original md5sum and insert it into database > > up to here it is working fine. > > 4) now get all FULL URIs from the page requisites > > *PAFF* > > How can this be done ? > > Please note, that the files should be renamed to md5-hashes and > reinseted into the original page. Then safed all files into ONE > directory with names as md5-hashes. > > Note: I am talking about (curently) 127.000.000 files. > It is curently in a Raid-5 with 7 x 147 GByte but because > a major upgrade of Hardware to 15 x 300 GByte the number > of files will increase > > Curently I do not know, whether I should use ONE Raid with > 15 HDDs, TWO with 7 HDDs, three with 5 HDDs or 5 with 3 HDDs. > > Maybe I will run into a performance problems with the Inodes > which I already have... (I think) > > Greetings > Michelle > > -- > Linux-User #280138 with the Linux Counter, http://counter.li.org/ > Michelle Konzack Apt. 917 ICQ #328449886 > 50, rue de Soultz MSM LinuxMichi > 0033/3/88452356 67100 Strasbourg/France IRC #Debian (irc.icq.com) > > > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php