Re: I/O on select count(*)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Title: Re: [PERFORM] I/O on select count(*)
BTW – we’ve removed HINT bit checking in Greenplum DB and improved the visibility caching which was enough to provide performance at the same level as with the HINT bit optimization, but avoids this whole “write the data, write it to the log also, then write it again just for good measure” behavior.

For people doing data warehousing work like the poster, this Postgres behavior is miserable.  It should be fixed for 8.4 for sure (volunteers?)

BTW – for the poster’s benefit, you should implement partitioning by date, then load each partition and VACUUM ANALYZE after each load.  You probably won’t need the date index anymore – so your load times will vastly improve (no indexes), you’ll store less data (no indexes) and you’ll be able to do simpler data management with the partitions.

You may also want to partition AND index if you do a lot of short range selective date predicates.  Example would be: partition by day, index on date field, queries selective on date ranges by hour will then select out only the day needed, then index scan to get the hourly values.  Typically time-oriented data is nearly time sorted anyway, so you’ll also get the benefit of a clustered index.

- Luke


On 5/15/08 10:40 AM, "Pavan Deolasee" <pavan.deolasee@xxxxxxxxx> wrote:

On Thu, May 15, 2008 at 7:51 AM, Greg Smith <gsmith@xxxxxxxxxxxxx> wrote:
>
>
> So is vacuum helpful here because it will force all that to happen in one
> batch?  To put that another way:  if I've run a manual vacuum, is it true
> that it will have updated all the hint bits to XMIN_COMMITTED for all the
> tuples that were all done when the vacuum started?
>

Yes. For that matter, even a plain SELECT or count(*) on the entire
table is good enough. That will check every tuple for visibility and
set it's hint bits.

Another point to note is that the hint bits are checked and set on a
per tuple basis. So especially during index scan, the same heap page
may get rewritten many times. I had suggested in the past that
whenever we set hint bits for a tuple, we should check all other
tuples in the page and set their hint bits too to avoid multiple
writes of the same page. I guess the idea got rejected because of lack
of benchmarks to prove the benefit.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux