Hi Bert,
Yes, ETL process needs to load the data every day into datawarehouse database from OLTP database and there would be no DML operations executing on DWH DB which is mainly used for analyzing the data and getting the reports for making business decisions. After every data load process, statistices should be updated in order to gain the performance therefore it's best practice to add analyze command for the respective tables in ETL script itself. If auto vacuum is enabled on tables which participate in ETL process, autovacuum kicks off frequently when its threshold values met and keep running affecting data load process.
If we disable autovacuum on table level (NOT at cluster level) auto vacuum never kicks off during ETL process and ETL job get finished quickly with analyze operation completed as well.
This is a daily process therefore you no need to worry about the autovacuum kick off and analyze operations which get executed manually from ETL job.
Thanks & Regards,
Prashanth Ranjalkar
Database Consultant & Architect
Skype:prashanth.ranjalkar
Cell: +91 932 568 2271
On Thu, Jan 24, 2013 at 11:56 AM, Bert <biertie@xxxxxxxxx> wrote:
Hello,
I can see why that would be a good practice if you do a daily load, but the goal is to have a 'live' dwh. with updated statistics every so-many-hours.
Thank you for the information anyway. I guess I will ask more questions here when time passes. And I'll try to share some knowlegde too! :)
cheers,
Bert--
On Wed, Jan 23, 2013 at 4:45 PM, Prashanth Ranjalkar <prashant.ranjalkar@xxxxxxxxx> wrote:
Hi Bert,Vaccum analyze operation would be a time consuming activity when it operates on partitioned table in parent and child relationship by using a manual vaccum option. When vaccum operation is performed the total vacuum/analyze time would be total time on completion of the said actvity on all child table in spite of parent table is empty. If you are specifically mentioning the child and parent table names in the script then the timing would be different and independent of each table which would be less..To answer your second question, it's good to disable autovacuum on table level for those tables which are part of ETL data load operation to avoid the interfierence of autovaccum while loading data and it would be good practice to analyze the tables from ETL script itself so that stats are up to date to get throughput of the application query perfoamance.Thanks & Regards,Prashanth RanjalkarDatabase Consultant & ArchitectSkype:prashanth.ranjalkarCell: +91 932 568 2271
On Wed, Jan 23, 2013 at 8:16 PM, Bert <biertie@xxxxxxxxx> wrote:
but thank you for giving me some clarification.Yes, at the moment that is still the case.We got big big problems with db2 auto runstats (sort of auto analyze) kicking in at random moments, that's the main reason we want to control it with postgres too.
We are migrating from db2 to postgres, and are still in progress of migrating our ETL tool. So there isn't much moment in those tables at the moment.
I am still wondering why the children need to be analyzed, if we vacuum/analyze the childs seperatly.
cheers,
Bert
--
On Wed, Jan 23, 2013 at 3:40 PM, Rural Hunter <ruralhunter@xxxxxxxxx> wrote:
I'm doing the same thing. In my case, the vacuum part on parent is very quick while analyzing takes a bit longer since it runs rough analyzes all children tables. You can see the behavior by "analyze verbose". Maybe the bigger part of your vacuum/analyze is on analyze so that you are seeing this result.
于 2013/1/23 19:43, Bert 写道:--
Hello,
I first wrote, by mistake, to the sql mailing list. But here is my e-mail:
I wrote a script to make sure all tables are vacuumed and analyzed every evening. This works very well.
I save in a table the start and end time of a vacuum/analyze. This way I can measure what tables take a long time to vaccum/analyze, and what tables are slow. (and much more).
But I have noticed that the parent table of a partitioned table also takes a long time. Here is a snap shot of the following table
table_name ; avg runt time ; max run time ; min run time
"f_transaction_1" ; "00:03:07.8" ; "00:03:10" ; "00:03:03"
"f_transaction" ; "00:02:19.8" ; "00:02:25" ; "00:02:16"
f_tranaction_1 is 16GB data + 12GB of indexes. (I know, a lot of indexes). f_tranaction is totally empy, but also contains all indexes. Which means 0B table zise, and 140kB index size.
Does anyone has an idea why in this case the vacuum/analyze takes almost as long on the parent table as on the biggest child table? (the other child tables are smaller than f_tranaction_1, and their vacuum/analyze time is much shorter).
wkr,
Bert
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Bert Desmet
0477/305361