Re: Query hitting empty tables taking 48 minutes

Robert Creager <robert@xxxxxxxxxxxxxxxx> · Mon, 18 Jun 2018 16:33:54 -0600

On Jun 18, 2018, at 4:04 PM, Laurenz Albe <laurenz.albe@xxxxxxxxxxx> wrote:

Robert Creager wrote:
A different query started showing up as the problem, the auto_explain with analyze shows an oddity,
the total query duration is 11k seconds, while the query itself is 3 seconds.  I captured a ZFS
snapshot as soon as the problem was noticed.

   db=tapesystem,user=Administrator,app=[unknown],client=127.0.0.1 LOG:  duration: 10856644.336 ms  plan:                                               
         Query Text: DELETE FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.s3_object WHERE id = ds3.blob.object_id AND (bucket_id = $1))                 
         Delete on blob  (cost=1308.79..1312.82 rows=1 width=12) (actual time=3465.919..3465.919 rows=0 loops=1)                                        
[...]

Doing a ZFS rollback and executing the query shows what is happening, although not to the extent above.
If I read this correctly, it’s the constraint checks that are causing the query to take so long.
I don’t see any server configuration that might allow those checks to be parallelized.  Is that possible?

tapesystem=# PREPARE foo(uuid) AS DELETE FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.s3_object WHERE id = ds3.blob.object_id AND (bucket_id = $1)); EXPLAIN ANALYZE EXECUTE foo('bc6e6b10-80ad-4329-9fb9-1a66d8c1505e');
PREPARE

Are we missing an "EXECUTE foo('bc6e6b10-80ad-4329-9fb9-1a66d8c1505e')" here?

Nope, just hidden on the same line, this is the plan for that EXECUTE

                                                           QUERY PLAN                                                                                                                                  
--------------------------------------------------------------------------------------------------------------------------------                                                                       
 Delete on blob  (cost=9555.07..21134.01 rows=299993 width=12) (actual time=1516.140..1516.140 rows=0 loops=1)
[...]
 Trigger for constraint blob_tape_blob_id_fkey: time=5389.627 calls=300000
[...]
Execution time: 85175.581 ms

I executed a vacuum analyze, then a vacuum full analyze, neither changed the timing.
Other than removing constraints, is there any way to address this?

I cannot explain the discrepancy between the runtimes of 85 seconds versus 10857 seconds.

It would be nice if the auto_explain analyze did include the other information like the psql analyze does.

But other than that, it sure looks like the foreign keys are missing an index on
the source columns, leading to a sequential scan for each deletion and table.

You’d think, but they are present.  As an example, the first constraint blob_tape_blob_id_fkey is indexed.

create table blob
(
[…]
  id            uuid   not null
    constraint blob_pkey
    primary key,
[…]
);

create table blob_tape
(
  blob_id     uuid    not null
    constraint blob_tape_blob_id_fkey
    references ds3.blob
    on update cascade on delete cascade,
  id          uuid    not null
    constraint blob_tape_pkey
    primary key,
[…]
);

create index tape_blob_tape__blob_id
  on blob_tape (blob_id);

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com