> I'm aware of the problems with random UUID values. I was using this
> function to create ulids from the beginning:
Oh, well that would have been useful information to provide at the
outset.
I'm sorry, I left this out.
Now that we know the index order is correlated with creation
time, I wonder if it is also correlated with nrefs, in such a way that
scanning in index order is disastrous because all the desired rows are
at the end of the index.
Possibly, I have no idea.
Also, you deny deleting any rows, but that wasn't the point. Do you
ever update nrefs from zero to nonzero? That'd also leave dead
entries behind in this index. If that is a routine event that is
correlated with creation time, it gets easier to believe that your
index could have lots of dead entries at the front.
I have checked the trigger that is maintaining the nrefs field. Blocks are referenced from a "file_block" table. Every time a block is created, it first has an initial value of nrefs=0, then a file_block row (reference) is inserted, and nrefs is incremented to one. It means that every block has shown up once in the index, and then disappeared. If the index was never vacuumed, then it is very plausible that it is full of dead rows.
CREATE OR REPLACE FUNCTION media.trg_aiud_file_block()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
begin
if TG_OP='INSERT' then
update media.block set nrefs = nrefs + 1 where id = new.block_id;
return new;
end if;
if TG_OP='UPDATE' then
if old.block_id is distinct from new.block_id then
update media.block set nrefs = nrefs + 1 where id = new.block_id;
update media.block set nrefs = nrefs - 1 where id = old.block_id;
end if;
return new;
end if;
if TG_OP='DELETE' then
update media.block set nrefs = nrefs - 1 where id = old.block_id;
return old;
end if;
end;
$function$
;
The idea was to create an index that can help in quickly removing unused blocks, to free up disk space. It would be much better to keep out the initially inserted (not yet references) from the index, but I don't know how to do this.
We'd still have to figure out why autovacuum is failing to clean out
those entries in a timely fashion, but that seems like a plausible
way for the performance problem to exist.
Yes, that would be very good to know. I cloud drop and recreate the index now, but after some time I would be facing the same situation again.
I double checked, and the "autovacuum launcher" process is running.
Here are the current settings:
=# select name, setting, unit, min_val, max_val, boot_val, reset_val, pending_restart from pg_settings where name like '%vacuum%';
name | setting | unit | min_val | max_val | boot_val | reset_val | pending_restart
---------------------------------------+------------+------+---------+------------+------------+------------+-----------------
autovacuum | on | | | | on | on | f
autovacuum_analyze_scale_factor | 0.1 | | 0 | 100 | 0.1 | 0.1 | f
autovacuum_analyze_threshold | 50 | | 0 | 2147483647 | 50 | 50 | f
autovacuum_freeze_max_age | 200000000 | | 100000 | 2000000000 | 200000000 | 200000000 | f
autovacuum_max_workers | 3 | | 1 | 262143 | 3 | 3 | f
autovacuum_multixact_freeze_max_age | 400000000 | | 10000 | 2000000000 | 400000000 | 400000000 | f
autovacuum_naptime | 60 | s | 1 | 2147483 | 60 | 60 | f
autovacuum_vacuum_cost_delay | 2 | ms | -1 | 100 | 2 | 2 | f
autovacuum_vacuum_cost_limit | -1 | | -1 | 10000 | -1 | -1 | f
autovacuum_vacuum_insert_scale_factor | 0.2 | | 0 | 100 | 0.2 | 0.2 | f
autovacuum_vacuum_insert_threshold | 1000 | | -1 | 2147483647 | 1000 | 1000 | f
autovacuum_vacuum_scale_factor | 0.2 | | 0 | 100 | 0.2 | 0.2 | f
autovacuum_vacuum_threshold | 50 | | 0 | 2147483647 | 50 | 50 | f
autovacuum_work_mem | -1 | kB | -1 | 2147483647 | -1 | -1 | f
log_autovacuum_min_duration | 600000 | ms | -1 | 2147483647 | 600000 | 600000 | f
vacuum_cost_delay | 0 | ms | 0 | 100 | 0 | 0 | f
vacuum_cost_limit | 200 | | 1 | 10000 | 200 | 200 | f
vacuum_cost_page_dirty | 20 | | 0 | 10000 | 20 | 20 | f
vacuum_cost_page_hit | 1 | | 0 | 10000 | 1 | 1 | f
vacuum_cost_page_miss | 2 | | 0 | 10000 | 2 | 2 | f
vacuum_defer_cleanup_age | 0 | | 0 | 1000000 | 0 | 0 | f
vacuum_failsafe_age | 1600000000 | | 0 | 2100000000 | 1600000000 | 1600000000 | f
vacuum_freeze_min_age | 50000000 | | 0 | 1000000000 | 50000000 | 50000000 | f
vacuum_freeze_table_age | 150000000 | | 0 | 2000000000 | 150000000 | 150000000 | f
vacuum_multixact_failsafe_age | 1600000000 | | 0 | 2100000000 | 1600000000 | 1600000000 | f
vacuum_multixact_freeze_min_age | 5000000 | | 0 | 1000000000 | 5000000 | 5000000 | f
vacuum_multixact_freeze_table_age | 150000000 | | 0 | 2000000000 | 150000000 | 150000000 | f
(27 rows)
name | setting | unit | min_val | max_val | boot_val | reset_val | pending_restart
---------------------------------------+------------+------+---------+------------+------------+------------+-----------------
autovacuum | on | | | | on | on | f
autovacuum_analyze_scale_factor | 0.1 | | 0 | 100 | 0.1 | 0.1 | f
autovacuum_analyze_threshold | 50 | | 0 | 2147483647 | 50 | 50 | f
autovacuum_freeze_max_age | 200000000 | | 100000 | 2000000000 | 200000000 | 200000000 | f
autovacuum_max_workers | 3 | | 1 | 262143 | 3 | 3 | f
autovacuum_multixact_freeze_max_age | 400000000 | | 10000 | 2000000000 | 400000000 | 400000000 | f
autovacuum_naptime | 60 | s | 1 | 2147483 | 60 | 60 | f
autovacuum_vacuum_cost_delay | 2 | ms | -1 | 100 | 2 | 2 | f
autovacuum_vacuum_cost_limit | -1 | | -1 | 10000 | -1 | -1 | f
autovacuum_vacuum_insert_scale_factor | 0.2 | | 0 | 100 | 0.2 | 0.2 | f
autovacuum_vacuum_insert_threshold | 1000 | | -1 | 2147483647 | 1000 | 1000 | f
autovacuum_vacuum_scale_factor | 0.2 | | 0 | 100 | 0.2 | 0.2 | f
autovacuum_vacuum_threshold | 50 | | 0 | 2147483647 | 50 | 50 | f
autovacuum_work_mem | -1 | kB | -1 | 2147483647 | -1 | -1 | f
log_autovacuum_min_duration | 600000 | ms | -1 | 2147483647 | 600000 | 600000 | f
vacuum_cost_delay | 0 | ms | 0 | 100 | 0 | 0 | f
vacuum_cost_limit | 200 | | 1 | 10000 | 200 | 200 | f
vacuum_cost_page_dirty | 20 | | 0 | 10000 | 20 | 20 | f
vacuum_cost_page_hit | 1 | | 0 | 10000 | 1 | 1 | f
vacuum_cost_page_miss | 2 | | 0 | 10000 | 2 | 2 | f
vacuum_defer_cleanup_age | 0 | | 0 | 1000000 | 0 | 0 | f
vacuum_failsafe_age | 1600000000 | | 0 | 2100000000 | 1600000000 | 1600000000 | f
vacuum_freeze_min_age | 50000000 | | 0 | 1000000000 | 50000000 | 50000000 | f
vacuum_freeze_table_age | 150000000 | | 0 | 2000000000 | 150000000 | 150000000 | f
vacuum_multixact_failsafe_age | 1600000000 | | 0 | 2100000000 | 1600000000 | 1600000000 | f
vacuum_multixact_freeze_min_age | 5000000 | | 0 | 1000000000 | 5000000 | 5000000 | f
vacuum_multixact_freeze_table_age | 150000000 | | 0 | 2000000000 | 150000000 | 150000000 | f
(27 rows)
I think I did not change the defaults.
> I can try to do VACUUM on this table, but I'm limited on resources. I think
> it will take days to do this. Maybe I can try to dump the whole database
> and restore it on another machine.
Pretty hard to believe that dump-and-restore would be faster than
VACUUM.
> (Is there a way to check the number of dead rows?)
I think contrib/pgstattuple might help.
All right, I started pgstattuple() and I'll also do pgstatindex(), but it takes a while. I'll get back with the results.
Thank you for your help!
Regards,
Laszlo