Re: Handling (very) large files with PHP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Anyway, so, I am going to convert it into a database, and I insist on using
PHP for this.

Wrong answer. Use the right tool for the job. I don't think php is it. Personally I'd go for perl for doing this. It's much better (IMO) at text processing, especially large files. As long as you know regex'es (almost everything in perl is a regex), you'll be fine.

So the questions are,
How would I open the file? will fopen fread($file, 1024) will work? if then,
how would I find the seperator, "------------------", without taking too
many resources?

Not sure what the memory limits of PHP are on your server. Fifty megs
might be too much. However, there are two better alternatives than
fread(). First is the file() call. This reads the file in such a way
that every line becomes a member of an array. file() returns an array
containing every line in the file. However, since this is held in
memory, it could be a problem. Otherwise, use the fgets() function. This
will read your file a line at a time. You'll have to process and dispose
of the lines as you go, but it won't strain your resources. It could
take extra time to execute, though.

If 50 meg is too much, 50G is way too much ;) file() reads the whole thing in memory - so unless you have > 50G free on the machine, don't do it that way. Read/process it line by line - at least you won't kill your server.

Since inserting to the database, after considering it, will probably be with
C. But if I wish to work with it - will PHP be good?

Wait-- are you saying you'll do the inserts in C? Then you might as well
process the input in C as well, if you have the skills to do that.

Agreed.

Also no point processing it twice (once in php or whatever and then in C). Both mysql + postgresql have a way to import csv files. Generate the right format and import it that way. You'll save time importing the data this way too (and only add your indexes at the end if possible, makes it quicker for just data inserts - otherwise it's updating indexes & data as it's adding the rows in).

PostgreSQL. You could also use MySQL, but I prefer PostgreSQL. Both have
APIs for C and PHP, so you would insert/manipulate the data in either
language. In fact, you might find handling the input easier in Perl or
Python, where you don't have to deal with variable initialization as
much. And both also have APIs for MySQL and PostgreSQL. And I believe
both DBMSes can handle databases of this size.

Both can definitely handle it. The question is what's happening to the data afterwards? What sort of queries will you be running? Which db's do you have experience with? Tuning the two servers is completely different.

I'd also go for postgres but hey ;)

--
Postgresql & php tutorials
http://www.designmagick.com/


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux