Re: two memory-consuming postgres processes

Alexy Khrabrov <deliverable@xxxxxxxxx> · Fri, 2 May 2008 14:09:52 -0700

On May 2, 2008, at 2:02 PM, Craig James wrote:

On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov  
<deliverable@xxxxxxxxx> wrote:
I naively thought that if I have a 100,000,000 row table, of the form
(integer,integer,smallint,date), and add a real coumn to it, it  
will scroll
through the memory reasonably fast.

In Postgres, an update is the same as a delete/insert.  That means  
that changing the data in one column rewrites ALL of the columns for  
that row, and you end up with a table that's 50% dead space, which  
you then have to vacuum.

Sometimes if you have a "volatile" column that goes with several  
"static" columns, you're far better off to create a second table for  
the volatile data, duplicating the primary key in both tables.  In  
your case, it would mean the difference between 10^8 inserts of  
(int, float), very fast, compared to what you're doing now, which is  
10^8 insert and 10^8 deletes of (int, int, smallint, date, float),  
followed by a big vacuum/analyze (also slow).

The down side of this design is that later on, it requires a join to  
fetch all the data for each key.

You do have a primary key on your data, right?  Or some sort of index?

I created several indices for the primary table, yes.  Sure I can do a  
table for a volatile column, but then I'll have to create a new such  
table for each derived column -- that's why I tried to add a column to  
the existing table.  Yet seeing this is really slow, and I need to to  
many derived analyses like this -- which are later scanned in other  
computations, so should persist -- I indeed see no other way but to  
procreate derived tables with the same key, one column per each...

Cheers,
Alexy