> If I try to select a single unused block this way:
> explain analyze select id from media.block b where nrefs =0 limit 1
> then it runs for more than 10 minutes (I'm not sure how long, I cancelled
> the query after 10 minutes).
Are you sure it isn't blocked on a lock?
Yes, I'm sure. I have created a single database instance from a zfs snapshot and tried the query on that database. It was the only client.
Another theory is that the index contains many thousands of references
to now-dead rows, and the query is vainly searching for a live entry.
Given that EXPLAIN thinks there are only about 2300 live entries,
and yet you say the index is 400MB, this seems pretty plausible.
Nobody ever deleted anything from this table. Since it was created, this has been a write-only table.
Have you disabled autovacuum, or something like that? (REINDEX
could help here, at least till the index gets bloated again.)
I did not disable autovacuum.
You might think that even so, it shouldn't take that long ... but
indexes on UUID columns are a well known performance antipattern.
The index entry order is likely to have precisely zip to do with
the table's physical order, resulting in exceedingly-random access
to the table, which'll be horribly expensive when the table is so
much bigger than RAM. Can you replace the UUID column with a simple
serial (identity) column?
I'm aware of the problems with random UUID values. I was using this function to create ulids from the beginning:
CREATE OR REPLACE FUNCTION public.gen_ulid()
RETURNS uuid
LANGUAGE sql
AS $function$
SELECT (lpad(to_hex(floor(extract(epoch FROM clock_timestamp()) * 1000)::bigint), 12, '0') || encode(gen_random_bytes(10), 'hex'))::uuid;
$function$
;
If I order some rows by id values, I can see that their creation times are strictly ascending. I did not write this function, it was taken from this website:
They have a benchmark section where they show that these ULID values are slower to generate (at least with this implementation) but much faster to insert.
I might be able to replace these with int8 values, I need to check.
> I believe it is not actually using the index, because reading a single
> (random?) entry from an index should not run for >10 minutes.
You should believe what EXPLAIN tells you about the plan shape.
(Its rowcount estimates are only estimates, though.)
All of the 40M rows in this table are live. I'm 100% sure about this, because nobody ever deleted rows from this table.
I can try to do VACUUM on this table, but I'm limited on resources. I think it will take days to do this. Maybe I can try to dump the whole database and restore it on another machine. Would that eliminate dead rows? (Is there a way to check the number of dead rows?)
Regards,
Laszlo