Re: select query takes 13 seconds to run with index

"Gregory Williamson" <Gregory.Williamson@xxxxxxxxxxxxxxxx> · Mon, 26 May 2008 18:04:39 -0600

Title: RE: [GENERAL] select query takes 13 seconds to run with index

Justin wrote:

>

>

> mark wrote:

>> On Mon, May 26, 2008 at 4:26 PM, Justin <justin@xxxxxxxxxxxxxxx> wrote:

>>  

>>> mark wrote:

>>>    

>>>> Hi, is there anyway this can be made faster?  id is the primary key,

>>>> and there is an index on uid..

>>>> thanks

>>>> EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id

>>>> DESC limit 6;

>>>>                                                                     QUERY

>>>> PLAN

>>>>

>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------

>>>>  Limit  (cost=0.00..9329.02 rows=6 width=135) (actual

>>>> time=13612.247..13612.247 rows=0 loops=1)

>>>>   ->  Index Scan Backward using pokes_pkey on pokes

>>>> (cost=0.00..5182270.69 rows=3333 width=135) (actual

>>>> time=13612.245..13612.245 rows=0 loops=1)

>>>>         Filter: (uid = 578439028)

>>>>  Total runtime: 13612.369 ms

>>>> (4 rows)

>>>>      

>>> First this should be posted on performance list.

>>>    

>> sorry about this.

>>

>>  

>>> how many records are in this table?

>>>    

>> 22334262, 22 million records.

>>

>>  

>>> The estimate is way off, when was the last time Vaccum was on the table?

>>>    

>> about a week ago i ran this VACUUM VERBOSE ANALYZE;

>> this table is never updated or deleted, rows are just inserted...

>>

>>

>>  

>>> What verison of Postgresql are you running

>>>    

>> 8.3.1

>>

>>  

>>> Size of the Table

>>>    

>> 22 million rows approximately

>>  

> I have no experience  on large datasets so people with more experience in this area are going to have to chime in.

> My gut feel is 13 seconds for Postgresql to sort through an index of that size and table is not bad.

>

> you may need to take a look at hardware and postgresql.config settings to improve the performance for this query

>

> This query is very simple where changing it around or adding index results massive improvements is not going to help in this case.

>>  

>>> Table layout

>>>    

>> CREATE TABLE pokes

>> (

>>   id serial NOT NULL,

>>   uid integer,

>>   action_id integer,

>>   created timestamp without time zone DEFAULT now(),

>>   friend_id integer,

>>   message text,

>>   pic text,

>>   "name" text,

>>   CONSTRAINT pokes_pkey PRIMARY KEY (id)

>> )

>> WITH (OIDS=FALSE);

>> ALTER TABLE pokes OWNER TO postgres;

>>

>> -- Index: idx_action_idx

>>

>> -- DROP INDEX idx_action_idx;

>>

>> CREATE INDEX idx_action_idx

>>   ON pokes

>>   USING btree

>>   (action_id);

>>

>> -- Index: idx_friend_id

>>

>> -- DROP INDEX idx_friend_id;

>>

>> CREATE INDEX idx_friend_id

>>   ON pokes

>>   USING btree

>>   (friend_id);

>>

>> -- Index: idx_pokes_uid

>>

>> -- DROP INDEX idx_pokes_uid;

>>

>> CREATE INDEX idx_pokes_uid

>>   ON pokes

>>   USING btree

>>   (uid);

>>

>>

>>  

>>> Load on the database

>>>    

>> how do i measure load on database?

>>  

>

> How many users are attached to the server at any given time.  how many inserts, deletes selects are being done on the server.  Its number  TPS  on the server.

Justin wrote:

>

>

> mark wrote:

>> On Mon, May 26, 2008 at 4:26 PM, Justin <justin@xxxxxxxxxxxxxxx> wrote:

>>  

>>> mark wrote:

>>>    

>>>> Hi, is there anyway this can be made faster?  id is the primary key,

>>>> and there is an index on uid..

>>>> thanks

>>>> EXPLAIN ANALYZE select * from pokes where uid = 578439028 order by id

>>>> DESC limit 6;

>>>>                                                                     QUERY

>>>> PLAN

>>>>

>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------

>>>>  Limit  (cost=0.00..9329.02 rows=6 width=135) (actual

>>>> time=13612.247..13612.247 rows=0 loops=1)

>>>>   ->  Index Scan Backward using pokes_pkey on pokes

>>>> (cost=0.00..5182270.69 rows=3333 width=135) (actual

>>>> time=13612.245..13612.245 rows=0 loops=1)

>>>>         Filter: (uid = 578439028)

>>>>  Total runtime: 13612.369 ms

>>>> (4 rows)

>>>>      

>>> First this should be posted on performance list.

>>>    

>> sorry about this.

>>

>>  

>>> how many records are in this table?

>>>    

>> 22334262, 22 million records.

>>

>>  

>>> The estimate is way off, when was the last time Vaccum was on the table?

>>>    

>> about a week ago i ran this VACUUM VERBOSE ANALYZE;

>> this table is never updated or deleted, rows are just inserted...

>>

>>

>>  

>>> What verison of Postgresql are you running

>>>    

>> 8.3.1

>>

>>  

>>> Size of the Table

>>>    

>> 22 million rows approximately

>>  

> I have no experience  on large datasets so people with more experience in this area are going to have to chime in.

> My gut feel is 13 seconds for Postgresql to sort through an index of that size and table is not bad.

>

> you may need to take a look at hardware and postgresql.config settings to improve the performance for this query

>

> This query is very simple where changing it around or adding index results massive

> improvements is not going to help in this case.

I just ran a test on not particularly impressive hardware (8.2.6) on a table with 58980741 rows:

billing=# explain analyze select * from stats_asset_use where date = '2006-03-12' order by tracking_id desc limit 6;

                                                                      QUERY PLAN                                      

-------------------------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=0.00..5.45 rows=6 width=38) (actual time=0.028..0.037 rows=6 loops=1)

   ->  Index Scan Backward using stats_day_ndx on stats_asset_use  (cost=0.00..61279.91 rows=67437 width=38) (actual time=0.026..0.032 rows=6 loops=1)

         Index Cond: (date = '2006-03-12'::date)

 Total runtime: 5.957 ms

(4 rows)

There is an index on date (only). A typical day might have anywhere from a few thousand entries to a few hundred thousand with the average in the low thousands. Inserts only, no deletes or updates.

This table gets analyzed daily (overkill) so the stats are up to date; I wonder if that's a problem in your case ?

>>  

>>> Table layout

>>>    

>> CREATE TABLE pokes

>> (

>>   id serial NOT NULL,

>>   uid integer,

>>   action_id integer,

>>   created timestamp without time zone DEFAULT now(),

>>   friend_id integer,

>>   message text,

>>   pic text,

>>   "name" text,

>>   CONSTRAINT pokes_pkey PRIMARY KEY (id)

>> )

>> WITH (OIDS=FALSE);

>> ALTER TABLE pokes OWNER TO postgres;

>>

>> -- Index: idx_action_idx

>>

>> -- DROP INDEX idx_action_idx;

>>

>> CREATE INDEX idx_action_idx

>>   ON pokes

>>   USING btree

>>   (action_id);

>>

>> -- Index: idx_friend_id

>>

>> -- DROP INDEX idx_friend_id;

>>

>> CREATE INDEX idx_friend_id

>>   ON pokes

>>   USING btree

>>   (friend_id);

>>

>> -- Index: idx_pokes_uid

>>

>> -- DROP INDEX idx_pokes_uid;

>>

>> CREATE INDEX idx_pokes_uid

>>   ON pokes

>>   USING btree

>>   (uid);

>>

>>

>>  

>>> Load on the database

>>>    

>> how do i measure load on database?

>>  

>

> How many users are attached to the server at any given time.  how many inserts, deletes

> selects are being done on the server.  Its number  TPS  on the server.

On Windoze I don't know; on *NIX variants the utility "top" can show useful information on load and active processes; iostat or vmstat can give detailed looks over time (use period of between 1 and 5 seconds maybe and discard the first row as nonsense); they show disk i/o and context switching, etc.

HTH,

Greg Williamson

Senior DBA

DigitalGlobe

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

(My corporate masters made me say this.)