We ran self joins of that table reasonably consistently by the way:
FROM bigtable lhs, bigtable rhs
�� � AND '' > lhs.timestamp AND lhs.timestamp >= ''
�� ��AND '' > rhs.timestamp AND rhs.timestamp >= ''
�� � AND lhs.timestamp = rhs.timestamp
�� � AND lhs.foo = rhs.foo
�� � AND lhs.bar = rhs.bar
This really liked the timestamp index and we had to be careful to only do it for a few days at a time. �It took a few minutes each go but it was definitely doable.
Once you get this large you do have to be careful with a few things though:
*It's somewhat easy to write super long queries or updates. �This can lots of dead rows in your tables. �Limit your longest running queries to a day or so. �Note that queries are unlikely to take that long but updates with massive date ranges could. �SELECT COUNT(*) FROM bigtable too about 30 minutes when the server wasn't under heavy load.
*You sometimes get bad plans because:
**You don't or can't get enough statistics about a column.
**PostgreSQL doesn't capture statistics about two columns together. �PostgreSQL has no way of knowing that columnA = 'foo' implies columnB = 'bar' about 30% of the time.
Nik
On Thu, May 27, 2010 at 5:58 AM, Massa, Harald Armin <chef@xxxxxxx> wrote:
Dann,There really are domains that big, so that there is no more normalization or other processes to mitigate the problem.
Examples:
Microsoft's registered customers database (all MS products bought by any customer, including operating systems)
Tolls taken on the New Jersey road system for FY 2009
DNA data from the Human Genome Project.....please also think of ouer most risk exposed users, the ones using Poker / Roulette simulation and analyzing software with an PostgrSQL database below. There are so many rounds of Poker to play .... :)Harald�
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Stra� 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
Using PostgreSQL is mostly about sleeping well at night.