Re: Slow query, possibly not using index

Les <nagylzs@xxxxxxxxx> · Mon, 28 Aug 2023 08:04:28 +0200

> I'm aware of the problems with random UUID values. I was using this

> function to create ulids from the beginning:

Oh, well that would have been useful information to provide at the

outset.  
I'm sorry, I left this out. 
Now that we know the index order is correlated with creation

time, I wonder if it is also correlated with nrefs, in such a way that

scanning in index order is disastrous because all the desired rows are

at the end of the index.
Possibly, I have no idea.

Also, you deny deleting any rows, but that wasn't the point.  Do you

ever update nrefs from zero to nonzero?  That'd also leave dead

entries behind in this index.  If that is a routine event that is

correlated with creation time, it gets easier to believe that your

index could have lots of dead entries at the front.

I have checked the trigger that is maintaining the nrefs field. Blocks are referenced from a "file_block" table. Every time a block is created, it first has an initial value of nrefs=0, then a file_block row (reference) is inserted, and nrefs is incremented to one. It means that every block has shown up once in the index, and then disappeared. If the index was never vacuumed, then it is very plausible that it is full of dead rows.

CREATE OR REPLACE FUNCTION media.trg_aiud_file_block()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
begin
	if TG_OP='INSERT' then
		update media.block set nrefs = nrefs + 1 where id = new.block_id;
		return new;
	end if;
	if TG_OP='UPDATE' then
		if old.block_id is distinct from new.block_id then
			update media.block set nrefs = nrefs + 1 where id = new.block_id;
			update media.block set nrefs = nrefs - 1 where id = old.block_id;
		end if;
		return new;
	end if;
	if TG_OP='DELETE' then
		update media.block set nrefs = nrefs - 1 where id = old.block_id;
		return old;
	end if;
end;
$function$
;

The idea was to create an index that can help in quickly removing unused blocks, to free up disk space. It would be much better to keep out the initially inserted (not yet references) from the index, but I don't know how to do this.

We'd still have to figure out why autovacuum is failing to clean out

those entries in a timely fashion, but that seems like a plausible

way for the performance problem to exist.
Yes, that would be very good to know. I cloud drop and recreate the index now, but after some time I would be facing the same situation again.

I double checked, and the "autovacuum launcher" process is running.

Here are the current settings:

=# select name, setting, unit, min_val, max_val, boot_val, reset_val, pending_restart from pg_settings where name like '%vacuum%';
                 name                  |  setting   | unit | min_val |  max_val   |  boot_val  | reset_val  | pending_restart 
---------------------------------------+------------+------+---------+------------+------------+------------+-----------------
 autovacuum                            | on         |      |         |            | on         | on         | f
 autovacuum_analyze_scale_factor       | 0.1        |      | 0       | 100        | 0.1        | 0.1        | f
 autovacuum_analyze_threshold          | 50         |      | 0       | 2147483647 | 50         | 50         | f
 autovacuum_freeze_max_age             | 200000000  |      | 100000  | 2000000000 | 200000000  | 200000000  | f
 autovacuum_max_workers                | 3          |      | 1       | 262143     | 3          | 3          | f
 autovacuum_multixact_freeze_max_age   | 400000000  |      | 10000   | 2000000000 | 400000000  | 400000000  | f
 autovacuum_naptime                    | 60         | s    | 1       | 2147483    | 60         | 60         | f
 autovacuum_vacuum_cost_delay          | 2          | ms   | -1      | 100        | 2          | 2          | f
 autovacuum_vacuum_cost_limit          | -1         |      | -1      | 10000      | -1         | -1         | f
 autovacuum_vacuum_insert_scale_factor | 0.2        |      | 0       | 100        | 0.2        | 0.2        | f
 autovacuum_vacuum_insert_threshold    | 1000       |      | -1      | 2147483647 | 1000       | 1000       | f
 autovacuum_vacuum_scale_factor        | 0.2        |      | 0       | 100        | 0.2        | 0.2        | f
 autovacuum_vacuum_threshold           | 50         |      | 0       | 2147483647 | 50         | 50         | f
 autovacuum_work_mem                   | -1         | kB   | -1      | 2147483647 | -1         | -1         | f
 log_autovacuum_min_duration           | 600000     | ms   | -1      | 2147483647 | 600000     | 600000     | f
 vacuum_cost_delay                     | 0          | ms   | 0       | 100        | 0          | 0          | f
 vacuum_cost_limit                     | 200        |      | 1       | 10000      | 200        | 200        | f
 vacuum_cost_page_dirty                | 20         |      | 0       | 10000      | 20         | 20         | f
 vacuum_cost_page_hit                  | 1          |      | 0       | 10000      | 1          | 1          | f
 vacuum_cost_page_miss                 | 2          |      | 0       | 10000      | 2          | 2          | f
 vacuum_defer_cleanup_age              | 0          |      | 0       | 1000000    | 0          | 0          | f
 vacuum_failsafe_age                   | 1600000000 |      | 0       | 2100000000 | 1600000000 | 1600000000 | f
 vacuum_freeze_min_age                 | 50000000   |      | 0       | 1000000000 | 50000000   | 50000000   | f
 vacuum_freeze_table_age               | 150000000  |      | 0       | 2000000000 | 150000000  | 150000000  | f
 vacuum_multixact_failsafe_age         | 1600000000 |      | 0       | 2100000000 | 1600000000 | 1600000000 | f
 vacuum_multixact_freeze_min_age       | 5000000    |      | 0       | 1000000000 | 5000000    | 5000000    | f
 vacuum_multixact_freeze_table_age     | 150000000  |      | 0       | 2000000000 | 150000000  | 150000000  | f
(27 rows)

I think I did not change the defaults.

> I can try to do VACUUM on this table, but I'm limited on resources. I think

> it will take days to do this. Maybe I can try to dump the whole database

> and restore it on another machine.

Pretty hard to believe that dump-and-restore would be faster than

VACUUM.

> (Is there a way to check the number of dead rows?)

I think contrib/pgstattuple might help.

All right, I started pgstattuple()  and I'll also do pgstatindex(), but it takes a while. I'll get back with the results.

Thank you for your help!

Regards,

    Laszlo