Re: How many table scans in a delete...

"Scott Marlowe" <scott.marlowe@xxxxxxxxx> · Fri, 4 Jul 2008 16:24:32 -0600

On Fri, Jul 4, 2008 at 10:00 AM, Jessica Richard <rjessil@xxxxxxxxx> wrote:
> I am just trying to understand how Postgres does its delete work.
>
> If I have a table testDad with a primary key on cola and no index on colb,
> and I have a kid table testKid with a foreign key to reference testDad but
> no index created on the foreign key column on table testKid.
>
> I have 10,000 rows in each table and I want to delete 5000 rows from the Dad
> table, of course , I have to kill all the kids in the Kid table first so
> that I won't get the constraint error.

Generally speaking, that's doing things wrong.  It's generally easier
to use a cascading delete fk so that you don't have to delete the rows
from the kid table first.

> Now I am ready to run my delete command on the Dad table with the following
> command:
>
> delete from testDad where colb = 'abc';
>
> (supposed select count(*) from testDad where colb = 'abc' will give me 5000
> rows)
>
> Since I don't have any index on testDad.colb, I know it is going to do a
> table scan on the table testDad to find all the qualified rows for the
> delete.
>
> My question1: how many table scans will this single transaction do  to find
> all  5000 qualified rows on the Dad table testDad? Does it scan the entire
> table once to get all qualified deletes? or it has to do the table scan 5000
> times on testDad?

One on table Dad.

> then, after all the 5000 qualified rows have been found on table testDad,
> the constraints between the Dad and Kid table will be checked against those
> 5000 qualified rows on table testDad.
>
> My question 2: does it take one qualified row at a time from the Dad table
> then  do a table scan on the kid table for constraint check? In this case,
> it will have to do 5000 times of table scan on the kid table. very unlikely
> it will scan the kid table only once to do all constraint checking for 5000
> different primary values...

Maybe.  Depends on how much memory it would take to run it with
various join methods.  But the worst case scenario is a seq scan on
the child table for each row in Dad table.  Which is why it's a good
idea to use indexes on FK fields.