Re: Increasing pattern index query speed

Mario Weilguni <mweilguni@xxxxxxxx> · Wed, 26 Nov 2008 15:15:29 +0100

Andrus schrieb:
Richard,

These are the same but the times are different. I'd be very surprised if
you can reproduce these times reliably.

I re-tried today again and got same results: in production database 
pattern query  is many times slower that equality query.
toode and rid base contain only single product starting with 99000010
So both queries should scan exactly same numbers of rows.

Can I give you some wider-ranging suggestions Andrus?
1. Fix the vacuuming issue in your hash-join question.

I have ran VACUUM FULL VERBOSE ANALYSE and set max_fsm_pages=150000
So issue is fixed before those tests.

2. Monitor the system to make sure you know if/when disk activity is 
high.

I optimized this system. Now there are short (some seconds) sales 
queries about after every 5 - 300 seconds which cause few disk 
activity and add few new rows to some tables.
I havent seen that this activity affects to this test result.

3. *Then* start to profile individual queries and look into their plans.
Change the queries one at a time and monitor again.

How to change pattern matching query to faster ?

Andrus.

Btw.

I tried to reproduce this big difference in test server in 8.3 using 
sample data script below and got big difference but in opposite 
direction.

explain analyze   SELECT sum(1)
FROM   orders
JOIN orders_products USING (order_id)
JOIN products USING (product_id)
WHERE orders.order_date>'2006-01-01' and ...

different where clauses produce different results:

AND orders_products.product_id =     '3370000000000000'  -- 880 .. 926 ms
AND  orders_products.product_id like '3370000000000000%' -- 41 ..98 ms

So patter index is 10 .. 20 times (!) faster always.
No idea why.

Test data creation script:

begin;
CREATE OR REPLACE FUNCTION Counter() RETURNS int IMMUTABLE AS
$_$
SELECT 3500000;
$_$ LANGUAGE SQL;

CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE 
NOT NULL);
CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name 
char(70) NOT NULL, quantity numeric(12,2) default 1);
CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL, 
product_id CHAR(20),
 id serial, price numeric(12,2) default 1 );

INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20),
      'product number ' || n::TEXT FROM generate_series(0,13410) AS n;

INSERT INTO orders
SELECT n,'2005-01-01'::date + (4000.0 * n/Counter() * '1 DAY'::interval)
FROM generate_series(0, Counter()/3 ) AS n;

SET work_mem TO 2097151;

INSERT INTO orders_products SELECT
  generate_series/3 as  order_id,
  ( (1+ (generate_series % 13410))*power( 10,13))::INT8::CHAR(20) AS 
product_id
FROM generate_series(1, Counter());

ALTER TABLE orders ADD PRIMARY KEY (order_id);
ALTER TABLE products ADD PRIMARY KEY (product_id);
ALTER TABLE orders_products ADD PRIMARY KEY (id);

ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES 
products(product_id);
ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES 
orders(order_id) ON DELETE CASCADE;

CREATE INDEX orders_date ON orders( order_date );
CREATE INDEX order_product_pattern_idx ON orders_products( product_id 
bpchar_pattern_ops );

COMMIT;
SET work_mem TO DEFAULT;
ANALYZE;

No wonder that = compares bad, you created the index this way:
CREATE INDEX order_product_pattern_idx ON orders_products( product_id 
bpchar_pattern_ops );
why not:
CREATE INDEX order_product_pattern_idx ON orders_products( product_id);

explain analyze   SELECT sum(1)
FROM   orders
JOIN orders_products USING (order_id)
JOIN products USING (product_id)
WHERE orders.order_date>'2006-01-01'
AND orders_products.product_id =     '3370000000000000';

                                                               QUERY 
PLAN                                                                
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=3013.68..3013.69 rows=1 width=0) (actual 
time=8.206..8.207 rows=1 loops=1)
  ->  Nested Loop  (cost=10.83..3013.21 rows=185 width=0) (actual 
time=2.095..7.962 rows=189 loops=1)
        ->  Index Scan using products_pkey on products  
(cost=0.00..8.27 rows=1 width=18) (actual time=0.036..0.038 rows=1 loops=1)
              Index Cond: ((product_id)::text = '3370000000000000'::text)
        ->  Nested Loop  (cost=10.83..3003.09 rows=185 width=18) 
(actual time=2.052..7.474 rows=189 loops=1)
              ->  Bitmap Heap Scan on orders_products  
(cost=10.83..949.68 rows=253 width=22) (actual time=0.161..0.817 
rows=261 loops=1)
                    Recheck Cond: ((product_id)::text = 
'3370000000000000'::text)
                    ->  Bitmap Index Scan on foo  (cost=0.00..10.76 
rows=253 width=0) (actual time=0.116..0.116 rows=261 loops=1)
                          Index Cond: ((product_id)::text = 
'3370000000000000'::text)
              ->  Index Scan using orders_pkey on orders  
(cost=0.00..8.10 rows=1 width=4) (actual time=0.020..0.021 rows=1 loops=261)
                    Index Cond: (orders.order_id = 
orders_products.order_id)
                    Filter: (orders.order_date > '2006-01-01'::date)
Total runtime: 8.268 ms

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance