Thanks to everyone for their response and help. I still have some
more questions that hopefully someone can help me with as I have not
yet been able to solve my vacuumdb problem. The posting of data to the table in question is extremely slow...yesterday I saw that it took over 6 min to post just 124 rows of data. That is just not acceptable. Additionally, we have about 9,000 to 11,000 products that come in daily (some contain one row of data...others may be several hundred lines) and are posted to the database. The system used to run with maybe 10-100 products in the queue (that is before I posted over 200 million rows). Yesterday, there were over 25,000 products in the queue waiting to be processed - which is bad for our operational use. I think the table got even more bloated when I tried to do the vacuum full last week and had to kill the process - it never completed. From the info I received from previous posts, I am going to change my game plan. If anyone has thoughts as to different process or can confirm that I am on the right track, I would appreciate your input. 1. I am going to run a CLUSTER on the table instead of a VACUUM FULL. But I have a few questions that are not clear from the info I have found on the internet regarding this process. The table name is 'pecrsep', and the database is hd_ob92rha. It has multiple columns in the table, but I am choosing the 'obsdate' as my index (I believe 'index' refers to the name of a column in the table?). Some of the columns are listed: lid | pe1 | pe2 | dur | idur | t | s | e | p
| obstime |
z0000 | z0015 | z0030 | z0045 | z0100 | z0115 | z0130 ... etc Anyway, if I interpret the info correctly the first time a CLUSTER is run, I have to run something like: CLUSTER obsdate ON pecrsep (CLUSTER indexname ON tablename) Is it true that the first time I run a CLUSTER, I must provide an 'indexname' or can I just run: CLUSTER pecrsep (CLUSTER tablename)? Or is it better to provide an indexname? 2. I know there are a lot of variables such as table size, etc. Are we looking at something that will take couple hours...or 24+ hours? Is there a way to monitor the CLUSTER process to ensure its working? Is there any way to get output to give an estimate as to how much of the CLUSTER has completed...and how much is left to run? 3. With the info from previous posts, I am certainly rethinking the use of VACUUM FULL. After I run the CLUSTER, should I REINDEX the table, or is that redundant to the CLUSTER? If I run the REINDEX, I would do it just for the single table: REINDEX TABLE pecrsep; Any idea as to how long this may take (i.e 1-2 hours or 24+ hours)? 4. If either/both the CLUSTER or REINDEX are successful, that would indicate that I don't need to run a VACUUM FULL...correct? 5. Then I would run a VACUUM ANALYZE as suggested. Does this sound like an appropriate plan? After I do all this on the single table, should I repeat something similar for the whole database? Or should I just attack the largest tables? One last question, can anyone recommend a good postgres book for me to purchase? I guess I should repeat that we are still running version 8.2.6 of postgres on Linux machines. Is there an appropriate book for this version available? Thanks again for your help...I hope I didn't ask too many questions, but the database is in poor shape and I need to get it working more efficiently quickly. Best Regards, Dave |
begin:vcard fn:David Ondrejik n:Ondrejik;David org:weather.gov/marfc;NOAA's National Weather Service adr;dom:Suite #330;;328 Innovation Blvd;State College;PA;16803 email;internet:david.ondrejik@xxxxxxxx title:Senior Hydrologist. Mid-Atlantic River Forecast Center tel;work:814-231-2401 version:2.1 end:vcard
-- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin