On Fri, Jan 6, 2017 at 4:09 AM, Pierre Ducroquet <pierre.ducroquet@xxxxxxxxxxxxxx> wrote: > Hi > > Running PostgreSQL 9.4, I am running in the following issue. > On a huge table, I have to remove the content (set to '') of a column that > makes for 99% of the weight of the table. Since the production can not be > stopped, I will not be able to do a VACUUM FULL nor any direct rewrite of the > table, so I tried the following trick to drop the content while reducing the > table weight. (I re-indented it for this email) > > ppr=500 > for i in `seq 0 60` ; do > psql $MYDB -c "UPDATE $TABLE > SET content = ''::bytea > WHERE ctid = ANY(ARRAY( > SELECT ('(' || p.i || ',' || s.i || ')')::tid > FROM generate_series(`$i * $ppr`, `($i + 1) * $ppr`) p(i), > generate_series(0, 2048) AS s(i) > ));" > psql $MYDB -c "VACUUM $TABLE;" > done > > This reduced my table from 1200MB down to 240MB, 29000 pages, while I expected > the final result to be even lower. I did a copy of the table, and it needed > only 30MB, 3800 pages, so there was still something wrong. > I did a simple (and slow) query to get the number of rows per page on both > table. The new one had slightly more rows per page, but this was in no way > enough to justify such a gap. Then I saw that after page 2080 the table had, > for the following 27000 pages, between 0 and 5 rows per page. So vacuum could > not remove the pages because there were gaps. > I figured out that, after a vacuum, updating the rows at the end of the table > should be put back at the beginning of the free space and thus after an update > of the rows of the last 27000 pages the table could be reduced to the optimal > size. > But when I raw the following, I was very surprised (NB: I know this query does > a full scan, I won't do it on a 100GB table, I promise) > > VACUUM VERBOSE $TABLE; > > SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)'; > => 29 rows > UPDATE $TABLE SET content = ''::bytea WHERE ctid > '(29000,0)'; > => 29 rows updated > SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)'; > ==> 29 rows again ! > > So instead of filling huge gaps (I've got for instance an 80 pages gap, from > id 2082 to id 2160), Pg put the data back at the end of the table. > > > I'm quite puzzled by this situation. I believed PostgreSQL would use the free > space to put the new data, so my update loop would give me a clean, «packed» > table. > > What behaviour did I miss here ? How can I get PostgreSQL to use that free > space without falling back to a vacuum full ? (And without using tools like > pg_repack if possible, because this seems like not needed in that situation) hm. Maybe HOT is involved? Does your column have an index on it? merlin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general