Loading large volumes of data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is more a process question that is not specific to PHP, but maybe
someone can help me.

I have a text file containing strings. The text file is pretty massive,
about 895 MB. I need to load the words in the text file into the database in
such a way that there is a single occurrence of each word in the table. What
this means is the script has to read each word in the file, check if the
word is in the table, and if not, add it. Obviously as the table grows, the
rate of execution drops because the number of checks that have to be made
increases as more words are added to the table.

Does anyone know of an efficient way to do this? There are over a million
lines in the file, and my script is at about 340 000 having run overnight. I
need to get the data into the table much faster than that.

I considered hashing the file in memory (ie, doing the uniqueness checking
in an associative array) but am not sure how much RAM that would consume and
I don't want to crash the server.

I have also not indexed the table because my gut feel was that the overhead
of constantly updating an indexed table would cancel out the gain of
checking an indexed table. Is that true?

Any and all help, comments, etc, appreciated as always.

Thanks

Evan Morris
evan@xxxxxxxxxxxxxxxxxx
Tel: +27 11 797 3840
Fax: +27 11 803 0211
Cell: +27 82 926 3630

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [PHP Users]     [Postgresql Discussion]     [Kernel Newbies]     [Postgresql]     [Yosemite News]

  Powered by Linux