Hi Running PostgreSQL 9.4, I am running in the following issue. On a huge table, I have to remove the content (set to '') of a column that makes for 99% of the weight of the table. Since the production can not be stopped, I will not be able to do a VACUUM FULL nor any direct rewrite of the table, so I tried the following trick to drop the content while reducing the table weight. (I re-indented it for this email) ppr=500 for i in `seq 0 60` ; do psql $MYDB -c "UPDATE $TABLE SET content = ''::bytea WHERE ctid = ANY(ARRAY( SELECT ('(' || p.i || ',' || s.i || ')')::tid FROM generate_series(`$i * $ppr`, `($i + 1) * $ppr`) p(i), generate_series(0, 2048) AS s(i) ));" psql $MYDB -c "VACUUM $TABLE;" done This reduced my table from 1200MB down to 240MB, 29000 pages, while I expected the final result to be even lower. I did a copy of the table, and it needed only 30MB, 3800 pages, so there was still something wrong. I did a simple (and slow) query to get the number of rows per page on both table. The new one had slightly more rows per page, but this was in no way enough to justify such a gap. Then I saw that after page 2080 the table had, for the following 27000 pages, between 0 and 5 rows per page. So vacuum could not remove the pages because there were gaps. I figured out that, after a vacuum, updating the rows at the end of the table should be put back at the beginning of the free space and thus after an update of the rows of the last 27000 pages the table could be reduced to the optimal size. But when I raw the following, I was very surprised (NB: I know this query does a full scan, I won't do it on a 100GB table, I promise) VACUUM VERBOSE $TABLE; SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)'; => 29 rows UPDATE $TABLE SET content = ''::bytea WHERE ctid > '(29000,0)'; => 29 rows updated SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)'; ==> 29 rows again ! So instead of filling huge gaps (I've got for instance an 80 pages gap, from id 2082 to id 2160), Pg put the data back at the end of the table. I'm quite puzzled by this situation. I believed PostgreSQL would use the free space to put the new data, so my update loop would give me a clean, «packed» table. What behaviour did I miss here ? How can I get PostgreSQL to use that free space without falling back to a vacuum full ? (And without using tools like pg_repack if possible, because this seems like not needed in that situation) Thanks Pierre
Attachment:
signature.asc
Description: This is a digitally signed message part.