Hi, Le Tuesday 05 February 2008 21:28:31 Alvaro Herrera, vous avez écrit : > Perhaps there is a user-level document somewhere. I don't think such a document already exists, even rtfm_please doesn't know about any as of now. So I'll try to begin something here, and depending on the comments I'll publish a short user oriented HOT introduction article. Here we go... PostgreSQL implements HOT (Heap Only Tuples), a way for the server to limit the work it has to make when updating tuples. That's what we call an optimization :) PostgreSQL MVCC implementation choice means that updating a tuple create a entire new version of it and mark the old one as no longer valid (as of the updating transaction id). Then VACUUM will have to clean out the old reference as soon as possible. Let's not forget that the indexes pointing the the old tuples need to point to the new version of it as of transaction id. PostgreSQL currently does not save visibility information into the index, though, reducing the janitoring here. But still, for the index, the operation of updating a tuple is equivalent to a delete and an insert. That's before HOT. Starting with PostgreSQL 8.3, when a tuple is updated and if the update only concerns non-indexed columns, the RDBMS is smart enough for the existing indexes not to need any update at all. This is done by creating a new tuple if possible on the same page as the old one, and maintaining a chain of updated tuples linking a new version to the old one. An HOT tuple is in fact one that can't be reached from any index. VACUUM will now only have to prune the tuple versions of the chain that are no more visible, and as no index were updated (there was no need to), there's no VACUUM work to get done on the indexes. Of course, for HOT to work properly, PostgreSQL has now to follow each HOT chain when SELECT'ing tuples and using an index, but the same amount of tuples version was to be read before HOT too. The difference is that with HOT the new versions of the HOT-updated tuples are no more reachable via the index directly, so PostgreSQL has to follow the chain when reading the heap. Please comment and correct me if my understanding is wrong (which wouldn't be a surprise), if this article over simplified, or not really written in English :) I'd like to publish a correct version of this for us to point asking users to, or maybe it could even end up as official documentation/FAQ material? Regards, -- dim
Attachment:
signature.asc
Description: This is a digitally signed message part.