RE: Select count(*) on a 2B Rows Tables Takes ~20 Hours

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just checked metrics while the count was running …

 

Read latency < 3.5 ms

Write latency < 4 ms

Read throughput ~ 40 MB/sec with sporadic peaks at 100

Read IOPS ~ 5000

QDepth < 3

 

 

----------------
Thank you

 

From: Tom Lane
Sent: Thursday, September 13, 2018 2:12 PM
To: Fd Habash
Cc: pgsql-performance@xxxxxxxxxxxxxxxxxxxx
Subject: Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours

 

Fd Habash <fmhabash@xxxxxxxxx> writes:

> Based on my research in the forums and Google , it is described in multiple places that ‘select count(*)’ is expected to be slow in Postgres because of the MVCC controls imposed upon the query leading a table scan. Also, the elapsed time increase linearly with table size.

> However, I do not know if elapsed time I’m getting is to be expected.

 

> Table reltuples in pg_class = 2,266,649,344 (pretty close)

> Query = select count(*) from jim.sttyations ;

> Elapsed time (ET) = 18.5 hrs

 

That's pretty awful.  My recollection is that in recent PG releases,

SELECT COUNT(*) runs at something on the order of 100ns/row given an

all-in-memory table.  Evidently you're rather badly I/O bound.

 

> This is an Aurora cluster running on r4.2xlarge (8 vCPU, 61g).

 

Don't know much about Aurora, but I wonder whether you paid for

guaranteed (provisioned) IOPS, and if so what service level.

 

> refpep-> select count(*) from jim.sttyations;

>                                                     QUERY PLAN                                                    

> ------------------------------------------------------------------------------------------------------------------

>  Aggregate  (cost=73451291.77..73451291.78 rows=1 width=8)

>    Output: count(*)

>    ->  Index Only Scan using stty_indx_fk03 on jim.sttyations  (cost=0.58..67784668.41 rows=2266649344 width=0)

>          Output: vsr_number

> (4 rows)

 

Oh, hmm ... the 100ns figure I mentioned was for a seqscan.  IOS

could be a lot worse for a number of reasons, foremost being that

if the table isn't mostly all-visible then it'd involve a lot of

random heap access.  It might be interesting to try forcing a

seqscan plan (see enable_indexscan).

 

                       regards, tom lane

 


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux