Re: Very Large text file parsing

Jim Lucas <lists@xxxxxxxxx> · Thu, 20 Sep 2007 23:00:55 -0700

Chris wrote:
Paul Scott wrote:
On Thu, 2007-09-20 at 09:54 -0300, Martin Marques wrote:
If not, you should just use the COPY command of PostgreSQL (you are 
using PostgreSQL if I remember correctly) or simply do a bash script 
using psql and the \copy command.

Unfortunately, this has to work on all supported RDBM's - so using
postgres or mysql specific functions are not really an option. What I am
trying though, is to add a function to do batch inserts as per Rob's
suggestion into our database abstraction layer, which may help things a
bit.

Both of these support importing csv files (use \copy as Martin mentioned 
for postgres). Mysql has this: 
http://dev.mysql.com/doc/refman/4.1/en/load-data.html

If you're supporting more than those two, can't really say whether 
others support this type of feature :)

Try batches:

begin;
... 5000 rows
commit;

and rinse/repeat. I know postgres will be a lot happier with that 
because otherwise it's doing a transaction per insert.

If you take the insert out of the equation (ie it runs through the file, 
parses it etc) is it fast? That'll tell you at least where the 
bottleneck is.

(Personally I'd use perl over php for processing files that large but 
that may not be an option).

As for MySQL, if the table that you are inserting to has any indexes at 
all, then each time your insert/update completes MySQL will re-index the 
table.

Therefor, if you can batch the data like Rob suggested, then it will cut 
down on the number of re-indexes and speed the process up substantially. 
 The one thing that you will need to watch for it the maximum size of 
an insert/update sql statement for the type of DB that you are using.

Hope this helps.

Jim

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php