select with "like" from another table

"Anton Maksimenkov" <anton200@xxxxxxxxx> · Mon, 29 May 2006 15:53:24 +0600

Hi.

I have 2 tables - one with calls numbers and another with calls codes.
The structure almost like this:
billing=# \d a_voip
                                       Table "public.a_voip"
     Column       |            Type             |
Modifiers
--------------------+-----------------------------+-----------------------------------------------------
id                 | integer                     | not null default
nextval('a_voip_id_seq'::regclass)
tm                 | timestamp without time zone | not null
user_name          | character varying(50)       | not null
...
calling_station_id | character varying(20)       | not null
called_station_id  | character varying(20)       | not null
Indexes:
  "a_voip_pkey" PRIMARY KEY, btree (id)
  "a_voip_tm" btree (tm)

billing=# \d a_voip_codes
      Table "public.a_voip_codes"
Column |         Type          | Modifiers
--------+-----------------------+-----------
code   | integer               | not null
region | character varying(77) |
tarif  | numeric(13,7)         |
Indexes:
  "a_voip_codes_pkey" PRIMARY KEY, btree (code)

I need to select longest codes from a_voip_codes which match with the
the called_station_id. Because codes (very rarely) changes I construct
query

SELECT  user_name, called_station_id,
(SELECT code FROM a_voip_codes AS c where v.called_station_id like
c.code || '%' order by code desc limit 1) AS code
FROM a_voip AS v WHERE user_name = 'dixi' AND tm between '2006-04-01'
and '2006-05-01' group by user_name, called_station_id;

Analyzed variant
billing=# explain analyze SELECT user_name, called_station_id, (SELECT
code FROM a_voip_codes AS c where v.called_station_id like c.code ||
'%' order by code desc limit 1) AS code FROM a_voip AS v WHERE
user_name = 'dixi' AND tm between '2006-04-01' and '2006-05-01' group
by user_name, called_station_id;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate  (cost=11515.93..12106.26 rows=69 width=22) (actual
time=215.719..677.044 rows=130 loops=1)
 ->  Bitmap Heap Scan on a_voip v  (cost=1106.66..11513.16 rows=554
width=22) (actual time=72.336..207.618 rows=848 loops=1)
       Recheck Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
       Filter: ((user_name)::text = 'dixi'::text)
       ->  Bitmap Index Scan on a_voip_tm  (cost=0.00..1106.66
rows=90943 width=0) (actual time=69.441..69.441 rows=93594 loops=1)
             Index Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
 SubPlan
   ->  Limit  (cost=0.00..8.55 rows=1 width=4) (actual
time=3.565..3.567 rows=1 loops=130)
         ->  Index Scan Backward using a_voip_codes_pkey on
a_voip_codes c  (cost=0.00..85.45 rows=10 width=4) (actual
time=3.560..3.560 rows=1 loops=130)
               Filter: (($0)::text ~~ ((code)::text || '%'::text))
Total runtime: 678.186 ms
(11 rows)

It is ugly, however not so long (but only for 69 rows). If I want to
select for ALL users it goes veeeery long:
billing=# explain analyze SELECT user_name, called_station_id, (SELECT
code FROM a_voip_codes AS c where v.called_station_id like c.code ||
'%' order by code desc limit 1) AS code FROM a_voip AS v WHERE tm
between '2006-04-01' and '2006-05-01' group by user_name,
called_station_id;

 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate  (cost=11740.52..107543.85 rows=11198 width=22) (actual
time=779.488..75637.623 rows=20564 loops=1)
 ->  Bitmap Heap Scan on a_voip v  (cost=1106.66..11285.81
rows=90943 width=22) (actual time=72.539..274.850 rows=90204 loops=1)
       Recheck Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
       ->  Bitmap Index Scan on a_voip_tm  (cost=0.00..1106.66
rows=90943 width=0) (actual time=69.853..69.853 rows=93594 loops=1)
             Index Cond: ((tm >= '2006-04-01 00:00:00'::timestamp
without time zone) AND (tm <= '2006-05-01 00:00:00'::timestamp without
time zone))
 SubPlan
   ->  Limit  (cost=0.00..8.55 rows=1 width=4) (actual
time=3.631..3.633 rows=1 loops=20564)
         ->  Index Scan Backward using a_voip_codes_pkey on
a_voip_codes c  (cost=0.00..85.45 rows=10 width=4) (actual
time=3.623..3.623 rows=1 loops=20564)
               Filter: (($0)::text ~~ ((code)::text || '%'::text))
Total runtime: 75652.199 ms
(10 rows)

So I want to ask, how can I reorganize query/structure for achieve
good performance?

I experiment with additional column (matched_code) for a_voip table
and think about RULE which will update that column "matched_code"
doing the (SELECT code FROM a_voip_codes AS c where
v.called_station_id like c.code || '%' order by code desc limit 1) job
when a_voip_codes updated. Or about TRIGGER. But this may also takes
long time, especially with short "code" numbers (like 1 digit). Look:

billing=# explain analyze UPDATE a_voip SET matched_code = (SELECT
code FROM a_voip_codes AS c WHERE a_voip.called_station_id like c.code
|| '%' order by code desc limit 1) WHERE matched_code LIKE '1%';

  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on a_voip  (cost=20.34..20467.27 rows=2057
width=168) (actual time=13.407..22201.369 rows=2028 loops=1)
 Filter: ((matched_code)::text ~~ '1%'::text)
 ->  Bitmap Index Scan on a_voip_matched_code  (cost=0.00..20.34
rows=2057 width=0) (actual time=2.035..2.035 rows=2028 loops=1)
       Index Cond: (((matched_code)::text >= '1'::character varying)
AND ((matched_code)::text < '2'::character varying))
 SubPlan
   ->  Limit  (cost=0.00..8.55 rows=1 width=4) (actual
time=10.909..10.911 rows=1 loops=2028)
         ->  Index Scan Backward using a_voip_codes_pkey on
a_voip_codes c  (cost=0.00..85.45 rows=10 width=4) (actual
time=10.923..10.923 rows=1 loops=2028)
               Filter: (($0)::text ~~ ((code)::text || '%'::text))
Total runtime: 23216.770 ms
(9 rows)

Is there any other ways to connect longest "code" with "called_station_id"?
--
engineer