On Wed, Mar 11, 2009 at 03:04:19AM +0200, ?????????? ???????? wrote: > *Handling (very) large files with PHP* > > Hello, I am planning a project in PHP, and I have few unsolved issues that > I'd like you to help me... > > The project will start by loading a file of about 50GB. > The file has a many objects with a pattern, for example, > > > Name: Joe > Joe likes to eat > ------------------- > Name: Daniel > Daniel likes to ask question on the PHP Mailing List > > > Anyway, so, I am going to convert it into a database, and I insist on using > PHP for this. > > So the questions are, > How would I open the file? will fopen fread($file, 1024) will work? if then, > how would I find the seperator, "------------------", without taking too > many resources? Not sure what the memory limits of PHP are on your server. Fifty megs might be too much. However, there are two better alternatives than fread(). First is the file() call. This reads the file in such a way that every line becomes a member of an array. file() returns an array containing every line in the file. However, since this is held in memory, it could be a problem. Otherwise, use the fgets() function. This will read your file a line at a time. You'll have to process and dispose of the lines as you go, but it won't strain your resources. It could take extra time to execute, though. > I'll have a dedicated server for this project so I could use exec, so I am > wondering if I should use exec to split the file? > How many hours or days do you think it will take me to insert all of the > data, if I have about 8,000,000,000 (8 billion/milliard) entries (objects)? Unknown, but you need to worry more about the timeout value on your PHP scripts. I've never messed with it, but I know it's there as a config value for PHP. > > After I insert all the data, I'll have to start working with it as well - > for example, having a list of all people and what comes after the word > "likes" in their entry. > > What do you suggest? I am concerened I might not be able to fully acomplish > both high speed with working (example above) and both high speed when > watching the data and adding more "works" (as stated above) with PHP. What > do you think? > Since inserting to the database, after considering it, will probably be with > C. But if I wish to work with it - will PHP be good? Wait-- are you saying you'll do the inserts in C? Then you might as well process the input in C as well, if you have the skills to do that. Yes, PHP will be fine to work with the database afterward. > > What database should I use for so much info? > PostgreSQL. You could also use MySQL, but I prefer PostgreSQL. Both have APIs for C and PHP, so you would insert/manipulate the data in either language. In fact, you might find handling the input easier in Perl or Python, where you don't have to deal with variable initialization as much. And both also have APIs for MySQL and PostgreSQL. And I believe both DBMSes can handle databases of this size. Paul -- Paul M. Foster -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php