Search Postgresql Archives

Populating huge tables each day

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

First I apologize for the length of this email im just afraid that my problem is quite complicated for explination & also about my english which is not my native language.... sorry :)

I'm currently building a site for a client which will need to update his database on a daily routine with about 3.5 million rows from data files (the files vary in size and can reach up to 1 million rows per file). That data will be retrieved from several files (Text or XML by my chioce) and part of the rows will need to be INSERT cause they doesnt have any previous record and some will need to UPDATE previous records or just INSERT and DELETE the previous records. Beside of that the new data also bring information in it that tell how to populate another table (2 fields and the result is alot less rows) and connect between the two tables by foreign key which is written in each row of the main table (the one with the 3.5 million rows). Now the site must work 24/7 and it will probably have 2 servers which will run PostreSQL (working as 1 database), the scripting language that ill use is PHP if it change anything.

I thought on how to accomplish this and I would like to receive comments and ideas. I'm mostly afraid from the stress it will make on the server during the process and that the site will display working information all the time.

A brief explaniation on the definitions ill use ahead:
1. New data - The retrieved data from the files (Text or XML).
2. Main table - The final table which need to hold about 3.5 million rows, it can be empty before the process or can hold already information that some of it need to get updated using the "New data" and the rest of the "New data" need to get inserted into it while the previous data which didnt got updated need to get deleted. 3. Second table - The table which hold information that the data at the "Main table" need to get connected to using foreign keys. 4. Temp table - A temporary table which will hold the "New data" till it will be ready to be INSERT/UPDATE the data at "Main table" (got the exact same columns as the "Main table"). 4. Temp table2 - A temporary table which is created by CREATE TEMPORARY TABLE AS (former SELECT INTO).

My plan:
1. With a COPY FROM ill insert the data to the "Temp table" (all of the files in Text format).
2. Run at PL/pgSQL function: {

A. Start transaction
B. DELETE the content of the current existing "Second table".
C. INSERT data into the "Second table" using the "Temp table" - each row will be checked to its values and compared to check if they exist already (SELECT) at the "Second table" and if not it will run an INSERT to create it at the "Second table" - im afraid that this process will be extremly heavy on the server.
D. DELETE the "Main table" rows.
E. With CREATE TEMPORARY TABLE AS ill create "Temp table2" which will contain all of the information of the "Temp table" + a subquery will retrieve the ID of the "Second table" for the foreign key - quite heavy process i suspect.
F. INSERT the data from "Temp table2" to the "Main table".
G. End transaction + quit from PL/pgSQL. }

3. Delete all the files.

Thanks alot in advance and again im sorry for the length of the mail :)

Ben-Nes Yonatan
Canaan Surfing ltd.
http://www.canaan.net.il


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
      message can get through to the mailing list cleanly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux