Re: Select count(*), the sequel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Interesting data points. The amount of rows that you managed to
insert into PostgreSQL before Oracle gave up the ghost is 95%
of the rows in the Oracle version of the database. To count 5%
fewer rows, it took PostgreSQL 24 seconds longer. Or adjusting
for the missing rows, 52 seconds longer for the entire table
or 18% longer than the full table scan in Oracle. This seems to
be well within the table layout size differences, possibly due
to the fillfactor used --not really bad at all. Now the timings
due to algorithm changes are interesting as indicating the room
for improvement due to those type of changes. A parallel sequential
full-table scan in PostgreSQL could provide the same speed up.
Currently that is not possible ... but development continues a
pace...

In fact, developing such functions in PostgreSQL could end up
being less expensive long-term than licensing Oracle RAC. I think
the point that you have helped make is that PostgreSQL performs
very well for many use cases that have typically been relegated
to expensive commecial databases such as Oracle, DB2,...

Regards,
Ken

On Sat, Oct 16, 2010 at 12:53:50PM -0400, Mladen Gogala wrote:
> There was some doubt as for the speed of doing the select count(*) in 
> PostgreSQL and Oracle.
> To that end, I copied the most part of the Oracle table I used before to 
> Postgres. Although the copy
> wasn't complete, the resulting table is already significantly larger than 
> the table it was copied from. The result still shows that Oracle is 
> significantly faster:
> Oracle result:
>
> SQL> alter system flush buffer_cache;
>
> System altered.
>
> SQL> select /*+ full(NO) noparallel */ count(*) from ni_occurrence no;
>
>   COUNT(*)
> ----------
>  402062638
>
> Elapsed: 00:03:16.45
>
>
>
> Hints are necessary because Oracle table is declared as parallel and I 
> didn't want the PK index to be used for counting. Oracle has a good habit 
> of using PK's for counting, if available.
>
>
> SQL> select bytes/1048576 as MB
>   2  from user_segments
>   3  where segment_name='NI_OCCURRENCE';
>
>     MB
> ----------
>      35329
>
> Elapsed: 00:00:00.85
> SQL>
>
> So, oracle stores 402 million records in 35GB and counts them in 3 minutes 
> 16.45 seconds  The very same table was partially copied to Postgres, 
> copying died with ORA-01555 snapshot too old sometimes this morning. I ran 
> vacuumdb -f -z on the database after the copy completed and the results are 
> below.
>
> mgogala=# select count(*) from ni_occurrence;
>    count
> -----------
>  382400476
> (1 row)
>
> Time: 221716.466 ms
> mgogala=#
> mgogala=# select 221/60::real;
>      ?column?
> ------------------
>  3.68333333333333
> (1 row)
>
> Time: 0.357 ms
> mgogala=#
> mgogala=# select pg_size_pretty(pg_table_size('ni_occurrence'));
>  pg_size_pretty
> ----------------
>  46 GB
> (1 row)
>
> Time: 0.420 ms
> mgogala=#
>
> The database wasn't restarted, no caches were flushed, the comparison was 
> done with a serious advantage for PostgreSQL. Postgres needed 3.68 minutes 
> to complete the count which is about the same Oracle but still somewhat 
> slower. Also, I am worried about the sizes. Postgres table is 11GB larger 
> than the original, despite having less data. That was an unfair and 
> unbalanced comparison because Oracle's cache was flushed and Oracle was 
> artificially restrained to use the full table scan without the aid of 
> parallelism. Here is the same result, with no hints and the autotrace on, 
> which shows what happens if I turn the hints off:
>
> SQL> select count(*) from ni_occurrence no;
>
>   COUNT(*)
> ----------
>  402062638
>
> Elapsed: 00:00:52.61
>
> Execution Plan
> ----------------------------------------------------------
> Plan hash value: 53476935
>
> --------------------------------------------------------------------------------
> ----------------------------------------
>
> | Id  | Operation          | Name          | Rows  | Cost (%CPU)|
>  Time      |    TQ |IN-OUT| PQ Distrib |
>
> --------------------------------------------------------------------------------
> ----------------------------------------
>
> |   0 | SELECT STATEMENT      |              |    1 | 54001  (19)|
>  00:01:08 |       |      |           |
>
> |   1 |  SORT AGGREGATE       |              |    1 |           |
>       |       |      |           |
>
> |   2 |   PX COORDINATOR      |              |      |           |
>       |       |      |           |
>
> |   3 |    PX SEND QC (RANDOM)      | :TQ10000          |    1 |           
> |
>       |  Q1,00 | P->S | QC (RAND)  |
>
> |   4 |     SORT AGGREGATE      |              |    1 |           |
>       |  Q1,00 | PCWP |           |
>
> |   5 |      PX BLOCK ITERATOR      |              |   402M| 54001  (19)|
>  00:01:08 |  Q1,00 | PCWC |           |
>
> |   6 |       INDEX FAST FULL SCAN| IDX_NI_OCCURRENCE_PID |   402M| 54001  
> (19)|
>  00:01:08 |  Q1,00 | PCWP |           |
>
> --------------------------------------------------------------------------------
> ----------------------------------------
>
> It took just 52 seconds to count everything, but Oracle didn't even scan 
> the table, it scanned a unique index, in parallel. That is the algorithmic 
> advantage that forced me to restrict the execution plan with hints. My 
> conclusion is that the speed of the full scan is OK, about the same as 
> Oracle speed.  There are, however, three significant algorithm advantages 
> on the Oracle's side:
>
> 1) Oracle can use indexes to calculate "select count"
> 2) Oracle can use parallelism.
> 3) Oracle can use indexes in combination with the parallel processing.
>
>
>
> Here are the descriptions:
>
> SQL> desc ni_occurrence
>  Name                       Null?    Type
>  ----------------------------------------- -------- 
> ----------------------------
>  ID                       NOT NULL NUMBER(22)
>  PERMANENT_ID                   NOT NULL VARCHAR2(12)
>  CALL_LETTERS                   NOT NULL VARCHAR2(5)
>  AIRDATE                   NOT NULL DATE
>  DURATION                   NOT NULL NUMBER(4)
>  PROGRAM_TITLE                        VARCHAR2(360)
>  COST                            NUMBER(15)
>  ASSETID                        NUMBER(12)
>  MARKET_ID                        NUMBER
>  GMT_TIME                        DATE
>  ORIG_ST_OCC_ID                     NUMBER
>  EPISODE                        VARCHAR2(450)
>  IMPRESSIONS                        NUMBER
>
> SQL>
> mgogala=# \d ni_occurrence
>                Table "public.ni_occurrence"
>      Column     |            Type             | Modifiers
> ----------------+-----------------------------+-----------
>  id             | bigint                      | not null
>  permanent_id   | character varying(12)       | not null
>  call_letters   | character varying(5)        | not null
>  airdate        | timestamp without time zone | not null
>  duration       | smallint                    | not null
>  program_title  | character varying(360)      |
>  cost           | bigint                      |
>  assetid        | bigint                      |
>  market_id      | bigint                      |
>  gmt_time       | timestamp without time zone |
>  orig_st_occ_id | bigint                      |
>  episode        | character varying(450)      |
>  impressions    | bigint                      |
> Indexes:
>     "ni_occurrence_pk" PRIMARY KEY, btree (id)
>
> mgogala=#
>
> Oracle block is 16k, version is 10.2.0.5 RAC, 64 bit (is anybody still 
> using 32bit db servers?) . Postgres is 9.0.1, 64 bit. Both machines are 
> running Red Hat 5.5:
>
>
> [mgogala@lpo-postgres-d01 ~]$ cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 5.5 (Tikanga)
> [mgogala@lpo-postgres-d01 ~]$
>
> Linux lpo-postgres-d01 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 
> x86_64 x86_64 x86_64 GNU/Linux
> [mgogala@lpo-postgres-d01 ~]$
>
> -- 
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> www.vmsinfo.com
> The Leader in integrated Media Intelligence Solutions
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux