Re: perf problem with huge table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Wed, Feb 10, 2010 at 4:16 PM, Dave Crooke <dcrooke@xxxxxxxxx> wrote:
Hi Rama

I'm actually looking at going in the other direction ....

I have an app using PG where we have a single table where we just added a lot of data, and I'm ending up with many millions of rows, and I'm finding that the single table schema simply doesn't scale.

In PG, the table partitioning is only handled by the database for reads, for insert/update you need to do quite a lot of DIY (setting up triggers, etc.) so I am planning to just use named tables and generate the necessary DDL / DML in vanilla SQL the same way that your older code does.

My experience is mostly with Oracle, which is not MVCC, so I've had to relearn some stuff:

Just a nit, but Oracle implements MVCC.  90% of the databases out there do.
 
- Oracle often answers simple queries (e.g. counts and max / min) using only the index, which is of course pre-sorted. PG has to go out and fetch the rows to see if they are still in scope, and if they are stored all over the place on disk it means an 8K random page fetch for each row. This means that adding an index to PG is not nearly the silver bullet that it can be with some non-MVCC databases.

- PG's indexes seem to be quite a bit larger than Oracle's, but that's gut feel, I haven't been doing true comparisons ...  however, for my app I have limited myself to only two indexes on that table, and each index is larger (in disk space) than the table itself ... I have 60GB of data and 140GB of indexes :-)

- There is a lot of row turnover in my big table (I age out data) .... a big delete (millions of rows) in PG seems a bit more expensive to process than in Oracle, however PG is not nearly as sensitive to transaction sizes as Oracle is, so you can cheerfully throw out one big "DELETE from FOO where ..." and let the database chew on it .

I find partitioning pretty useful in this scenario if the data allows is.  Aging out data just means dropping a partition rather than a delete statement.


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux