Thanks Pavan! I think the most important points are still that: 1. The WAL write should be happening asynchronously (if that is possible) 2. There should be an option do not perform these compactions if the page is only touched by reads. (Assuming that when most of the databaseresides in the cache these optimizations are less important.) -- Lars ----- Original Message ----- From: Pavan Deolasee <pavan.deolasee@xxxxxxxxx> To: Merlin Moncure <mmoncure@xxxxxxxxx> Cc: lars <lhofhansl@xxxxxxxxx>; Kevin Grittner <Kevin.Grittner@xxxxxxxxxxxx>; Ivan Voras <ivoras@xxxxxxxxxxx>; pgsql-performance@xxxxxxxxxxxxxx Sent: Wednesday, July 27, 2011 7:15 AM Subject: Re: UPDATEDs slowing SELECTs in a fully cached database On Wed, Jul 13, 2011 at 10:52 AM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote: ... There are couple of other (very important) things that HOT does, but probably its not advertised a lot. Even for non-HOT updates (which means either indexed columns were changed or page ran out of free space) or deletes, HOT prunes those tuples and instead mark the line pointer as DEAD. The page is defragmented and dead space is recovered. Each such dead tuple now only consumes two bytes in the page until vacuum removes the dead line pointers. Thats the reason why OP is seeing the behavior even when index columns are being updated. We made a few adjustments to ensure that a page is not pruned too early. So we track the oldest XID that did any updates/deletes to the page and attempt pruning only when the RecentXmin is past the XID. We also mark the page as "full" if some previous update did not find enough free space to do in-block update and use that hint to decide if we should attempt to prune the page. Finally, we prune only if we get the cleanup lock without blocking. What might be worth looking at this condition in pruneheap.c: /* * We prune when a previous UPDATE failed to find enough space on the page * for a new tuple version, or when free space falls below the relation's * fill-factor target (but not less than 10%). * * Checking free space here is questionable since we aren't holding any * lock on the buffer; in the worst case we could get a bogus answer. It's * unlikely to be *seriously* wrong, though, since reading either pd_lower * or pd_upper is probably atomic. Avoiding taking a lock seems more * important than sometimes getting a wrong answer in what is after all * just a heuristic estimate. */ minfree = RelationGetTargetPageFreeSpace(relation, HEAP_DEFAULT_FILLFACTOR); minfree = Max(minfree, BLCKSZ / 10); if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree) { So if the free space in a page falls below the fill-factor or 10% of the block size, we would try to prune the page. We probably need to revisit this area and see if we need to tune HOT ever better. One option could be to see how much space we are going to free and carry out the operation only if its significant enough to justify the cost. I know we had done several benchmarking tests while HOT development, but the tuning mechanism still may not be perfect for all kinds of work loads and it would probably never be. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance