"David Esposito" <pgsql-general@xxxxxxxxxxxxxxxxxxxxx> writes: > ... and the way new keys are > inserted into the index is to always add them to a new page (where the 'new' > page is either a truly new page, or a page that is completely empty), rather > than using up some of the fragmented space within existing pages? Well, they are added to a page associated with the key range they are in. You can't just stuff keys into any randomly chosen index page; otherwise the index wouldn't be readily searchable, which is more or less the entire point of an index. >> The testing I've been doing so far involves UPDATEs that touch a >> uniformly distributed subset of the table --- maybe that's the aspect >> that is failing to match your reality. Do you mostly update >> recently-added rows? Can you quantify the effect at all? > This is probably a good point ... The histogram of UPDATE activity to the > table probably looks something like > 90% - records created < 24 hours ago > 7% - records created 1 - 2 days ago > 2% - records created 2 - 7 days ago > 1% - records older than 7 days Ah; now I think things are starting to make sense. We had already estimated that about 10% of the records are updated each day, but what this says is that the majority of those updates happen to records that have never yet seen a VACUUM --- which is to say the last 2% (1/50th) of the table. So roughly speaking, the average record is updated about five times in its first day of existence, and on average less than once during its remaining 49 days of existence? Given those sorts of numbers, what we've got is that by the time the nightly VACUUM runs, the leading-edge part of the index (today's entries) has been bloated to about 5x what its minimum size would be. And what this means is that the steady-state situation will be that sort of density throughout the whole index. The difficulty is that the btree code will only reclaim entirely-empty index pages for reuse. Given that an index on integer keys can fit about 500 keys per page, even a 5x bloated index has still got about 100 keys per page, making it fairly unlikely for any of the pages to go completely empty until you purge that whole range of keys at the end of the record's life. This is a situation where it'd be nice if VACUUM could merge adjacent partly-full index pages so as to reclaim space before the pages go entirely empty on their own. We looked at doing that when the btree compaction code was first written, but set it aside as too hard because of various concurrency problems. (I think this is actually the first case I've seen reported from the field where that decision looks wrong. You've got a pretty odd update distribution here --- not so much that the skew to recent entries is surprising, as that the index keys are all on non-updating fields and so there's no spreading out of the index ranges affected by the updates.) Barring someone stepping up and making page merging happen (don't hold your breath), it seems you've basically got two alternatives: 1. VACUUM more often than once per day. 2. Live with a steady-state index size that's about 5x the minimum. I'd recommend taking a hard look at choice #1 --- you could experiment with using the "vacuum cost delay" settings to tamp down VACUUM's I/O demand to the point where it doesn't kill interactive performance, and then run it maybe every hour or two on the table(s) where you have this problem. (Memo to hackers: this is a fairly interesting case for autovacuum I think. The overall update rate on the table is not high enough to trigger frequent vacuums, unless autovacuum is somehow made aware that particular index key ranges are getting hit more heavily than others. Maybe this says that autovac needs to be tracking btree index page splits, or some such statistic, more than just overall updates.) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your message can get through to the mailing list cleanly