Hi all,
I am attempting to do a bunch of table rewrites to shrink a table in the absence of pg_repack and vacuum full (both not an option). The database is growing fast and has had significant bloat in both heaps and indexes, so index rebuilds alone won't cut it. We found that table inheritance can be used to achieve this rather nicely. We are running PG v9.6.18.
We are setting up the inheritance as follows:
BEGIN;
ALTER TABLE #{table} RENAME TO #{table}_old;
CREATE TABLE #{table} (LIKE #{table}_old INCLUDING ALL);
ALTER TABLE #{table}_old INHERIT #{table};
ALTER SEQUENCE #{table}_id_seq OWNED BY #{table}.id;
COMMIT;
Then, the migration itself runs as follows (each in a transaction, looping through records and sleeping for a bit)
WITH del AS (
DELETE FROM #{old_table}
WHERE id IN (
SELECT id
FROM #{old_table}
WHERE id > #{max_deleted_id} -- This is the max deleted from the previous batch, we grab it programmatically.
ORDER BY id ASC
LIMIT #{batch_size}
)
RETURNING *
)
INSERT INTO #{table}
SELECT * FROM del
RETURNING id
For instance, the batch_size can be 10_000, and the code sleeps programatically for 200ms (this is done in a Ruby script).
"max_deleted_id" is passed to each run from the previous one. This improves the inner SELECT query.
This works very well. However, I noticed two suprising things:
1. The performance of the delete and insert drops by several orders of magnitude as the script runs. For instance, in one run, it goes from 150ms average run to 700ms per batch.
2. The explain itself takes a while to run on a sample batch. In one table, the explain alone took four seconds.
To try and reproduce this locally, I used the following dummy table:
create table towns (id serial primary key, code text, article text, name text, department text);
insert into towns (
code, article, name, department
)
select
left(md5(i::text), 10),
md5(random()::text),
md5(random()::text),
left(md5(random()::text), 4)
from generate_series(1, 100000000) s(i);
This spends 150ms per batch, which climbs to 700ms per batch. A vacuum of the old table lowers is back to 150ms, but I don't understand why, because we structure the query to jump over all previously dead rows. There is an old thread in which Tom Lane mentions that the planner might itself be walking that primary index. Is this applicable here? And is there anything we can do besides more aggressive and continued vacuuming of the old table (or a change in autovacuum settings)? Ideally, we want to run this overnight without much supervision.
Is the problem really about writing the new table, or reading from the old table?
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.