Matthew Wakeling wrote:
Is it really safe to update the hint bits in place? If there is a power cut in the middle of writing a block, is there a guarantee from the disc that the block will never be garbled?
Don't know, to be honest. We've never seen any reports of corrupted data that would suggest such a problem, but it doesn't seem impossible to me that some exotic storage system might do that.
Is there a way to make a shortcut and have the hint bits written the first time the data is written to the table? One piece of obvious low-hanging fruit would be to enhance step five above, so that the bgwriter or checkpoint that writes the data to the database table checks the pg_clog and writes the correct hint bits.
Yep, that's an idea that's been suggested before. In fact, I seem to remember a patch to do just that. Don't remember what happened to it,
In fact, if the tuple's creating transaction has aborted, then the tuple can be vacuumed right there and then before it is even written.
Not if you have any indexes on the table. To vacuum, you'll have to scan all indexes to remove pointers to the tuple.
However, this idea does not deal well with bulk data loads, where the data is checkpointed before transaction is committed or aborted.
Yep, that's the killer :-(. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com