For people doing data warehousing work like the poster, this Postgres behavior is miserable. It should be fixed for 8.4 for sure (volunteers?)
BTW – for the poster’s benefit, you should implement partitioning by date, then load each partition and VACUUM ANALYZE after each load. You probably won’t need the date index anymore – so your load times will vastly improve (no indexes), you’ll store less data (no indexes) and you’ll be able to do simpler data management with the partitions.
You may also want to partition AND index if you do a lot of short range selective date predicates. Example would be: partition by day, index on date field, queries selective on date ranges by hour will then select out only the day needed, then index scan to get the hourly values. Typically time-oriented data is nearly time sorted anyway, so you’ll also get the benefit of a clustered index.
- Luke
On 5/15/08 10:40 AM, "Pavan Deolasee" <pavan.deolasee@xxxxxxxxx> wrote:
On Thu, May 15, 2008 at 7:51 AM, Greg Smith <gsmith@xxxxxxxxxxxxx> wrote:
>
>
> So is vacuum helpful here because it will force all that to happen in one
> batch? To put that another way: if I've run a manual vacuum, is it true
> that it will have updated all the hint bits to XMIN_COMMITTED for all the
> tuples that were all done when the vacuum started?
>
Yes. For that matter, even a plain SELECT or count(*) on the entire
table is good enough. That will check every tuple for visibility and
set it's hint bits.
Another point to note is that the hint bits are checked and set on a
per tuple basis. So especially during index scan, the same heap page
may get rewritten many times. I had suggested in the past that
whenever we set hint bits for a tuple, we should check all other
tuples in the page and set their hint bits too to avoid multiple
writes of the same page. I guess the idea got rejected because of lack
of benchmarks to prove the benefit.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance