In addition to the previous recommendation, make sure you have an index on
dokumnr in table omrid.
EG: dokumnr in table omrid.
CREATE INDEX omrid_dokumnr_fk ON omrid
On Sat, Sep 26, 2015 at 7:33 AM, David Rowley <david.rowley@xxxxxxxxxxxxxxx> wrote:
On 26 September 2015 at 19:53, Andrus <kobruleht2@xxxxxx> wrote:Hi!
I'm looking for a way to delete records which do not have child rows on big tables where lot of rows needs to be deleted. Both tables have lot of other foreign key references.
Document headers are in omdok table:
create table omdok ( dokumnr serial primary key, ... );
Document rows are in omrid table
CREATE TABLE omrid
(
id serial NOT NULL,
reanr serial NOT NULL,
dokumnr integer NOT NULL,
CONSTRAINT omrid_pkey PRIMARY KEY (id),
CONSTRAINT omrid_dokumnr_fkey FOREIGN KEY (dokumnr)
REFERENCES omdok (dokumnr) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE,
....
);
I tried
delete from omdok where dokumnr not in (select dokumnr from omrid)
Query it is running currently 15 hours and is still running.
postgres.exe is using 50% CPU all the time (this is 2 core CPU).
explain delete from omdok where dokumnr not in (select dokumnr from omrid)
returns:
"Delete (cost=0.00..21971079433.34 rows=220815 width=6)"
" -> Seq Scan on omdok (cost=0.00..21971079433.34 rows=220815 width=6)"
" Filter: (NOT (SubPlan 1))"
" SubPlan 1"
" -> Materialize (cost=0.00..94756.92 rows=1897261 width=4)"
" -> Seq Scan on omrid (cost=0.00..77858.61 rows=1897261 width=4)"
- How to delete parents without child rows fast?if you write the query as with a NOT EXISTS, rather than a NOT IN() it should complete much more quickly.It's important to know that the semantics of NOT IN are likely not at all what you think:For example, in the following query, would you expect it to return the row with a.a = 3 ?select * from (values(1),(2),(3)) a(a) WHERE a NOT IN(SELECT b FROM (VALUES(NULL),(1),(2)) b(b));The presence of NULL causes this to not behave the way you might think, yet it works exactly the way the SQL standard dictates.You could think of this as equivalent to writing:select * from (values(1),(2),(3)) a(a) WHERE a.a <> NULL AND a.a <> 1 AND a.a <> 2;And since a.a <> NULL is 'unknown', this causes the entire WHERE clause to be false, therefore nothing matches.For this reason PostgreSQL does not optimise NOT IN() the same way as it optimises NOT EXISTS().If you write the query as:delete from omdok where not exists(select 1 from omrid where omdok.dokumnr = omrid.dokumnr);then you might see it execute in a few seconds. Perhaps you should consider cancelling the current query, perhaps perform a VACUUM on omdoc after cancelling, and then run the NOT EXISTS version instead.I mocked up your tables locally, and populated them with the same number of records as your row estimates in the EXPLAIN you pasted and I got:test=# delete from omdok where not exists (select 1 from omrid where omdok.dokumn = omrid.dokumnr);DELETE 0Time: 1698.233 msWhereas with the NOT IN() I cancelled it after 10 minutes.RegardsDavid Rowley--
David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.