Re: Handling (very) large files with PHP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 11, 2009 at 03:04:19AM +0200, ?????????? ???????? wrote:

> *Handling (very) large files with PHP*
> 
> Hello, I am planning a project in PHP, and I have few unsolved issues that
> I'd like you to help me...
> 
> The project will start by loading a file of about 50GB.
> The file has a many objects with a pattern, for example,
> 
> 
> Name: Joe
> Joe likes to eat
> -------------------
> Name: Daniel
> Daniel likes to ask question on the PHP Mailing List
> 
> 
> Anyway, so, I am going to convert it into a database, and I insist on using
> PHP for this.
> 
> So the questions are,
> How would I open the file? will fopen fread($file, 1024) will work? if then,
> how would I find the seperator, "------------------", without taking too
> many resources?

Not sure what the memory limits of PHP are on your server. Fifty megs
might be too much. However, there are two better alternatives than
fread(). First is the file() call. This reads the file in such a way
that every line becomes a member of an array. file() returns an array
containing every line in the file. However, since this is held in
memory, it could be a problem. Otherwise, use the fgets() function. This
will read your file a line at a time. You'll have to process and dispose
of the lines as you go, but it won't strain your resources. It could
take extra time to execute, though.

> I'll have a dedicated server for this project so I could use exec, so I am
> wondering if I should use exec to split the file?
> How many hours or days do you think it will take me to insert all of the
> data, if I have about 8,000,000,000 (8 billion/milliard) entries (objects)?

Unknown, but you need to worry more about the timeout value on your PHP
scripts. I've never messed with it, but I know it's there as a config
value for PHP.

> 
> After I insert all the data, I'll have to start working with it as well -
> for example, having a list of all people and what comes after the word
> "likes" in their entry.
> 
> What do you suggest? I am concerened I might not be able to fully acomplish
> both high speed with working (example above) and both high speed when
> watching the data and adding more "works" (as stated above) with PHP. What
> do you think?
> Since inserting to the database, after considering it, will probably be with
> C. But if I wish to work with it - will PHP be good?

Wait-- are you saying you'll do the inserts in C? Then you might as well
process the input in C as well, if you have the skills to do that.

Yes, PHP will be fine to work with the database afterward.

> 
> What database should I use for so much info?
> 

PostgreSQL. You could also use MySQL, but I prefer PostgreSQL. Both have
APIs for C and PHP, so you would insert/manipulate the data in either
language. In fact, you might find handling the input easier in Perl or
Python, where you don't have to deal with variable initialization as
much. And both also have APIs for MySQL and PostgreSQL. And I believe
both DBMSes can handle databases of this size. 

Paul

-- 
Paul M. Foster

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux