On Thu, Sep 1, 2011 at 6:38 AM, Rik Bellens <rik.bellens@xxxxxxxxxxxxxx> wrote: > Op 01-09-11 14:22, Scott Marlowe schreef: >> >> On Thu, Sep 1, 2011 at 6:05 AM, Rik Bellens<rik.bellens@xxxxxxxxxxxxxx> >> wrote: >>> >>> Op 01-09-11 13:31, Scott Marlowe schreef: >>>> >>>> On Thu, Sep 1, 2011 at 4:32 AM, Rik Bellens<rik.bellens@xxxxxxxxxxxxxx> >>>> wrote: >>>>> >>>>> Hello, >>>>> >>>>> I have two large tables in a database, one containing original data and >>>>> the >>>>> other one derived from the first table. The first table contains >>>>> several >>>>> columns and indexes, while the second table has less columns and only >>>>> one >>>>> index. Both tables have the same number of rows. Nevertheless, the >>>>> second >>>>> table is much larger in disk size than the first one. How can this be >>>>> explained? >>>> >>>> This is most likely due to table bloat. In PostgreSQL when you update >>>> or delete a row, a dead version gets left behind. Vacuum eventually >>>> comes along and reclaims the empty space to be reused. If you delete >>>> / update a LOT of rows at once, then you'll have a lot of dead rows >>>> which can only be reused after vacuuming when you do more updates or >>>> deletes later on. >>>> >>>> A few salient questions. What version of PostgreSQL are you running? >>>> Is autovacuum running? Do you do a LOT of bulk deletes / updates? If >>>> you do a lot of bulk deletes on this table, and you delete everything, >>>> can you switch to using the truncate command instead? >>> >>> I use version 8.3. I see the 'autovacuum launcher process' and >>> 'autovacuum >>> worker process' in the process list, so I suppose autovacuum is running. >>> >>> Rows in the measurement table are added once and never deleted or >>> updated. >>> Adding a row to this table triggers a function that adds a row to the >>> stats_count table. Normally rows are added chronologically. So rows in >>> the >>> stats_count table are normally not updated either. If however, for some >>> reason, a measurement is added from an older time, all rows of that >>> device >>> which come after this time, are updated, but I don't think this will >>> happen >>> very often. >>> >>> The table 'stats_count' was created in a later stage, so the first 45M >>> rows >>> were added at once and chronologically. However, because the function to >>> initialize this table took a long time and the client application crashed >>> a >>> few times, I had to restart this function several times. Can it be that >>> there is some trash left from running this function several times without >>> finishing it? Would it be a solution to run 'VACUUM FULL' to reclaim some >>> disk space? >> >> Yeah, could be. Take a look at this page: >> http://wiki.postgresql.org/wiki/Show_database_bloat and see if the >> query there sheds some light on your situ. > > thanks for this answer > > if i run the query, I get 12433752064 wasted bytes on stats_count_pkey, so I > suppose that is the reason Sounds like it. Take a look here: http://wiki.postgresql.org/wiki/Index_Maintenance -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general