Re: Bottleneck?

wkipjohn@xxxxxxxxx · Thu, 06 Aug 2009 05:13:54 +0000

Is this alright?

>      QUERY PLAN 
> -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------
>  Sort  (cost=390162.53..390162.54 rows=3 width=567) (actual time=106045.453..106 078.238 rows=80963 loops=1)
>    Sort Key: rec.startdatetime, rec.id
>    Sort Method:  quicksort  Memory: 43163kB
>    ->  Nested Loop IN Join  (cost=360360.86..390162.51 rows=3 width=567) (actual  time=41205.683..105523.303 rows=80963 loops=1)
>          Join Filter: ((rec.parentguid)::text = (resolve.resolve)::text)
>          ->  Nested Loop  (cost=360360.86..389999.01 rows=3 width=567) (actual time=41127.859..105256.069 rows=80963 loops=1)
>                Join Filter: ((rec.guid)::text = (status.guid)::text)
>                ->  HashAggregate  (cost=360360.86..360405.96 rows=3608 width=16)  (actual time=41089.852..41177.137 rows=80000 loops=1)
>                      ->  Hash Join  (cost=335135.05..354817.67 rows=1108637 widt h=16) (actual time=36401.247..38505.042 rows=4000000 loops=1)
>                            Hash Cond: ((getcurrentguids.getcurrentguids)::text =  (status.guid)::text)
>                            ->  Function Scan on getcurrentguids  (cost=0.00..260 .00 rows=1000 width=32) (actual time=1009.161..1029.849 rows=80000 loops=1)
>                            ->  Hash  (cost=285135.53..285135.53 rows=3999962 wid th=16) (actual time=35391.697..35391.697 rows=4000000 loops=1)
>                                  ->  Seq Scan on status  (cost=0.00..2 85135.53 rows=3999962 width=16) (actual time=5.095..32820.746 rows=4000000 loops =1)
>                                        Filter: (startdatetime                 ->  Index Scan using index_status_startdatetime on status rec  (cost=0.00..8.15 rows=3 width=414) (actual time=0.796..0.797 r ows=1 loops=80000)
>                      Index Cond: (rec.startdatetime = (max(status.startdatetime)))
>          ->  Function Scan on resolve  (cost=0.00..260.00 rows=1000 width=32) (a ctual time=0.001..0.001 rows=1 loops=80963)
>  Total runtime: 106227.356 ms
> (18 rows)
> 

On Aug 6, 2009 2:30pm, wkipjohn@xxxxxxxxx wrote:
> Hi Scott,
> 
> Yes I did that. And you are right because I restart my machine, so the postgres cache is empty. And I think postgresql is reading all 1.8GB of data back into the cache when it does a seq scan on the status table.
> 
>      QUERY PLAN 
> -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------
>  Sort  (cost=390162.53..390162.54 rows=3 width=567) (actual time=106045.453..106 078.238 rows=80963 loops=1)
>    Sort Key: rec.startdatetime, rec.id
>    Sort Method:  quicksort  Memory: 43163kB
>    ->  Nested Loop IN Join  (cost=360360.86..390162.51 rows=3 width=567) (actual  time=41205.683..105523.303 rows=80963 loops=1)
>          Join Filter: ((rec.parentguid)::text = (resolve.resolve)::text)
>          ->  Nested Loop  (cost=360360.86..389999.01 rows=3 width=567) (actual time=41127.859..105256.069 rows=80963 loops=1)
>                Join Filter: ((rec.guid)::text = (status.guid)::text)
>                ->  HashAggregate  (cost=360360.86..360405.96 rows=3608 width=16)  (actual time=41089.852..41177.137 rows=80000 loops=1)
>                      ->  Hash Join  (cost=335135.05..354817.67 rows=1108637 widt h=16) (actual time=36401.247..38505.042 rows=4000000 loops=1)
>                            Hash Cond: ((getcurrentguids.getcurrentguids)::text =  (status.guid)::text)
>                            ->  Function Scan on getcurrentguids  (cost=0.00..260 .00 rows=1000 width=32) (actual time=1009.161..1029.849 rows=80000 loops=1)
>                            ->  Hash  (cost=285135.53..285135.53 rows=3999962 wid th=16) (actual time=35391.697..35391.697 rows=4000000 loops=1)
>                                  ->  Seq Scan on status  (cost=0.00..2 85135.53 rows=3999962 width=16) (actual time=5.095..32820.746 rows=4000000 loops =1)
>                                        Filter: (startdatetime                 ->  Index Scan using index_status_startdatetime on status rec  (cost=0.00..8.15 rows=3 width=414) (actual time=0.796..0.797 r ows=1 loops=80000)
>                      Index Cond: (rec.startdatetime = (max(status.startdatetime)))
>          ->  Function Scan on resolve  (cost=0.00..260.00 rows=1000 width=32) (a ctual time=0.001..0.001 rows=1 loops=80963)
>  Total runtime: 106227.356 ms
> (18 rows)
> 
> 
> 
> On Aug 6, 2009 2:19pm, Scott Marlowe scott.marlowe@xxxxxxxxx> wrote:
> > On Wed, Aug 5, 2009 at 8:50 PM, Ip Wing Kin Johnwkipjohn@xxxxxxxxx> wrote:
> > 
> > > I have a database (699221). It contains of 1.8GB data (707710). I am
> > 
> > > doing a complex query. Which required to load a 80MB index (732287).
> > 
> > >
> > 
> > > I restarted Postgresql so the cache is empty and it has to read the
> > 
> > > table and index from disk. Which I understand is an expensive process.
> > 
> > > But what I don't understand is even I split the index into a different
> > 
> > > tablespace located on a completely separate disk (mounted on /hdd2)
> > 
> > > there is still a very long I/O wait time. That index is only thing
> > 
> > > exist on that disk. Any idea why? Or any way I can find out what it is
> > 
> > > waiting for? Thanks.
> > 
> > 
> > 
> > OK before DTrace, did you run explain analyze on the query?  I think
> > 
> > the output of that would be interesting.
> > 
> > 
> > 
> > Looking at the DTrace output it looks to me like you're reading at
> > 
> > least one > 1GB table. since you're accessing a file with a .1 on it.
> >