> -----Original Message----- > From: Tom Lane [mailto:tgl@xxxxxxxxxxxxx] > Sent: Tuesday, July 12, 2005 10:14 AM > > "David Esposito" <pgsql-general@xxxxxxxxxxxxxxxxxxxxx> writes: > > As promised, here are two runs of VACUUM VERBOSE on the > problem table ... > > There was a lot of activity on the campaign_email table on Friday > > (Saturday's VACUUM) as compared with Monday (Tuesday's VACUUM) > > Well, what these numbers show is that you have 5% to 10% > daily turnover > of data in this table (maybe more --- are these two days > representative, > do you think)? But anyway, taking that number as gospel, you'd expect > that the table and indexes would settle at about 10% free space > immediately after each VACUUM. That would represent a steady state: > just enough free space to get eaten up till the next VACUUM. > The table > itself seems to have stabilized, and the "email_campaign_idx" index as > well -- note the latter didn't grow at all, and its internal > free space > is in the 10% ballpark: > > > INFO: index "email_campaign_idx" now contains 5881215 row > versions in 31435 > > pages > > DETAIL: 501822 index row versions were removed. > > 2016 index pages have been deleted, 896 are currently reusable. > > > INFO: index "email_campaign_idx" now contains 5583831 row > versions in 31435 > > pages > > DETAIL: 280860 index row versions were removed. > > 3266 index pages have been deleted, 2531 are currently reusable. > > I'm not sure why the other indexes don't seem to have reached their > steady-state 10% free yet. What can you tell us about the patterns > of data being inserted into these various indexes? > These two days are fairly representative ... Friday's being a bit above average, Monday's being a bit below ... the 10% turnover number sounds about right ... and I expected to see what you describe: a rough 10% slack in the data files which is why I am concerned about the growth of just a few specific indexes ... the data in this table is the past 50 days worth of history for an emailing application ... when we send the email, we create the record then use that record to track the bounce, opt-out, impression, and click activity for up to 50 days then we roll up all of the data into a statistics table and purge it ... As for the type of data that's in each of the indexes: campaign_email_pkey - the primary key on the table to uniquely identify the email sent; it's value is generated from a sequence ... obviously, the values in this column don't change once the record is created .. campaign_patron_unq - a concatenated index which ensures that a given email address only receives one copy of a given email ... the values in the two columns in this index don't change once the record is created email_patron_idx - an index on the email address (actually an integer FK which points at a table containing the email addresses) ... very high cardinality ... i'd estimate that for a given key, there are at most 4 or 5 records ... the values in this column doesn't change once the record is created email_campaign_idx - an index on the email content that the person was sent (an integer FK which points at the content in a seperate table) ... the cardinality for this is pretty low ... there are probably only a few hundred unique values in this index (still enough to make the index worthwhile, but for a given key, thousands of records could come back) ... the values in this column doesn't change once the record is created The last detail that I thought of that shouldn't have any impact but I figured was worth mentioning: all of these fields are NUMERIC(11,0) (it's a long story involving an automated tool used to generate a schema definition from an E/R diagram ... they should certainly just be INTEGERs, but that's water under the bridge at this point .. :-) Thanks again for your time and help, Dave ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend