Re: feeding big script to psql

Peter Wilson <petew@xxxxxxxxxxxxxxxx> · Tue, 02 Aug 2005 21:19:15 +0100

Tom Lane wrote:
> Peter Wilson <petew@xxxxxxxxxxxxxxxx> writes:
>> I found a while ago that after inserting a lot of rows into a clean
>> Postgres table it would take several minutes just to analyse a command,
>> not even starting the execution.
>
> Oh?  Could you provide a test case for this?  I can certainly believe
> that the planner might choose a bad plan if it has no statistics, but
> it shouldn't take a long time to do it.
>
> 			regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Hi Tom,
I've spent half an hour restoring the database from the backup I took. This was back in December '04. The database dump is about 95Mbytes, none of the 
tables are particularly big.

On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is what pgadminIII does when you press the explain button. According to the manual, 
this actually executes the command and gives actual times (I think pgadmin is wrong here - if you ask it to explain a command you want the plan it's 
going to use - not wait for it to finish!)

That said - I'll include the details here, at least partly because the tables are small in database terms, and I don't understand how this query even 
with lots of scanning can take 100% CPU for over 10 minutes (654 seconds for explain/analyze  651 seconds for execute).

OK - the query is:

SELECT DISTINCT c.client_id, c.instance, c.contact_id, c.uname FROM contacts c
   WHERE c.client_id = 'gadget'
     AND c.instance = '0'
     AND (    c.type = 'COMMUNITY'
           OR c.type = 'OU'
           OR c.type = 'INDIVIDUAL'
         )
     AND c.contact_id in (
            SELECT subb.community_id  FROM contact_att subb
               WHERE subb.client_id = 'gadget'
                 AND subb.instance = '0'
                 AND subb.contact_id = 3854.000000
        ) ;

By itself the sub-select executes in 235ms and yields a set that does not vary depending on the outer select - it's a constant set in effect. It would 
seem that in the worst case assuming no index or size information, the planner should spot the invariance of the subselect and a sequential scan of 
the 'contacts' table would be the worst result I would expect.

There are two tables involved in this query.

	'contacts' contains 3728 records.
	'contact_att' contains 19217 records.

The query plan yields :

Unique  (cost=12.05..12.06 rows=1 width=90) (actual time=654491.967..654491.984 rows=3 loops=1)
  ->  Sort  (cost=12.05..12.05 rows=1 width=90) (actual time=654491.958..654491.959 rows=3 loops=1)
        Sort Key: c.client_id, c.instance, c.contact_id, c.uname
        ->  Nested Loop IN Join  (cost=0.00..12.04 rows=1 width=90) (actual time=577763.884..654491.864 rows=3 loops=1)
              Join Filter: ("outer".contact_id = "inner".community_id)
              ->  Index Scan using cos_uk on contacts c  (cost=0.00..6.02 rows=1 width=90) (actual time=0.227..123.862 rows=2791 loops=1)
                    Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
                    Filter: ((("type")::text = 'COMMUNITY'::text) OR (("type")::text = 'OU'::text) OR (("type")::text = 'INDIVIDUAL'::text))
              ->  Index Scan using ca_pk on contact_att subb  (cost=0.00..6.01 rows=1 width=8) (actual time=0.207..234.423 rows=3 loops=2791)
                    Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
                    Filter: ((contact_id)::numeric = 3854.000000)
Total runtime: 654492.320 ms

Definitions for the two relevant tables are:

create table contacts (
    INSTANCE CHARACTER (1)  NOT NULL ,
    client_id varchar (50) not null ,
    contact_id bigint default nextval('contacts_contact_id_seq'::text),
    UNAME VARCHAR (32)  NOT NULL ,
    TYPE VARCHAR (20)  NOT NULL DEFAULT 'INDIVIDUAL',
    parent bigint,
    NAME VARCHAR (240) ,
    PHONE VARCHAR (32) ,
    FAX VARCHAR (32) ,
    EMAIL VARCHAR (240) ,
    BASIC_PW VARCHAR (128) ,
    DESCRIPTION VARCHAR (240),
    custom_data varchar(8192),
CONSTRAINT COS_PK PRIMARY KEY(INSTANCE,CLIENT_ID,CONTACT_ID)
)

create table contact_att(
    instance character(1),
    client_id varchar(50) not null,
    contact_id bigint,
    community_id bigint,
    inherited smallint,
CONSTRAINT CA_PK PRIMARY KEY(INSTANCE,CLIENT_ID,CONTACT_ID,COMMUNITY_ID)
)

CREATE INDEX CO_PA_IND ON CONTACTS (PARENT);
CREATE INDEX CO_TY_IND ON CONTACTS (TYPE);

--
Hope you find that useful! If there is anything else I can provide you with then please do let me know.

Pete

------------------------------------------------------------------------
Peter Wilson - YellowHawk Ltd, http://www.yellowhawk.co.uk 	

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq