Re: oracle to psql migration - slow query in postgres

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Samuel Gendler wrote:


On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala <mladen.gogala@xxxxxxxxxxx <mailto:mladen.gogala@xxxxxxxxxxx>> wrote:

     If working with partitioning, be very aware that PostgreSQL
    optimizer has certain problems with partitions, especially with
    group functions. If you want speed, everything must be prefixed
    with partitioning column: indexes, expressions, joins. There is no
    explicit star schema and creating hash indexes will not buy you
    much, as a matter of fact, Postgres community is extremely
    suspicious of the hash indexes and I don't see them widely used.
    Having said that, I was able to solve the problems with my speed
    and partitioning.


Could you elaborate on this, please? What do you mean by 'everythin must be prefixed with partitioning column?'

--sam
If you have partitioned table part_tab, partitioned on the column item_date and if there is a global primary key in Oracle, let's call it item_id, then queries like "select * from part_tab where item_id=12345" will perform worse than queries with item_date"

select * from part_tab where item_id=12345 and item_date='2010-10-15'

This also applies to inserts and updates. Strictly speaking, the item_date column in the query above is not necessary, after all, the item_id column is the primary key. However, with range scans you will get much better results if you include the item_date column than if you use combination of columns without. The term "prefixed indexes" is borrowed from Oracle RDBMS and means that the beginning column in the index is the column on which the table is partitioned. Oracle, as opposed to Postgres, has global indexes, the indexes that span all partitions. PostgreSQL only maintains indexes on each of the partitions separately. Oracle calls such indexes "local indexes" and defines them on the partitioned table level. Here is a brief and rather succinct explanation of the terminology:

http://www.oracle-base.com/articles/8i/PartitionedTablesAndIndexes.php


Of, course, there are other differences between Oracle partitioning and PostgreSQL partitioning. The main difference is $10000/CPU.
I am talking from experience:

news=> \d moreover_documents
             Table "moreover.moreover_documents"
       Column        |            Type             | Modifiers
----------------------+-----------------------------+-----------
document_id          | bigint                      | not null
dre_reference        | bigint                      | not null
headline             | character varying(4000)     |
author               | character varying(200)      |
url                  | character varying(1000)     |
rank                 | bigint                      |
content              | text                        |
stories_like_this    | character varying(1000)     |
internet_web_site_id | bigint                      | not null
harvest_time         | timestamp without time zone |
valid_time           | timestamp without time zone |
keyword              | character varying(200)      |
article_id           | bigint                      | not null
media_type           | character varying(20)       |
source_type          | character varying(20)       |
created_at           | timestamp without time zone |
autonomy_fed_at      | timestamp without time zone |
language             | character varying(150)      |
Indexes:
   "moreover_documents_pkey" PRIMARY KEY, btree (document_id)
Triggers:
insert_moreover_trigger BEFORE INSERT ON moreover_documents FOR EACH ROW EXE
CUTE PROCEDURE moreover_insert_trgfn()
Number of child tables: 8 (Use \d+ to list them.)

The child tables are, of course, partitions.

Here is the original:


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> desc moreover_documents
Name                       Null?    Type
----------------------------------------- -------- ----------------------------
DOCUMENT#                   NOT NULL NUMBER
DRE_REFERENCE                   NOT NULL NUMBER
HEADLINE                        VARCHAR2(4000)
AUTHOR                         VARCHAR2(200)
URL                            VARCHAR2(1000)
RANK                            NUMBER
CONTENT                        CLOB
STORIES_LIKE_THIS                    VARCHAR2(1000)
INTERNET_WEB_SITE#               NOT NULL NUMBER
HARVEST_TIME                        DATE
VALID_TIME                        DATE
KEYWORD                        VARCHAR2(200)
ARTICLE_ID                   NOT NULL NUMBER
MEDIA_TYPE                        VARCHAR2(20)
CREATED_AT                        DATE
SOURCE_TYPE                        VARCHAR2(50)
PUBLISH_DATE                        DATE
AUTONOMY_FED_AT                    DATE
LANGUAGE                        VARCHAR2(150)

SQL>



I must say that it took me some time to get things right.

--

Mladen Gogala Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions




--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux