Given a country table and an order table:
CREATE TABLE g.country ( -- inherited from table g.standard: oid uuid NOT NULL, -- … some more columns inherited… lisocode integer NOT NULL, -- Numeric ISO 3166 code szisoalpha2 character varying(2), -- The 2 letter country code szisoalpha3 character varying(3), -- The 3 letter country code szisonum3 character varying(3), -- The NUMERIC country code with leading zeros -- a few more columns here CONSTRAINT country_pkey PRIMARY KEY (oid) ) INHERITS (g.standard) WITH ( OIDS=FALSE ); This table is filled with ISO 3166 country codes. The order table: CREATE TABLE t."order" ( -- inherited from table g.standard: oid uuid NOT NULL, -- … some more columns inherited… szxmladdress text, -- many more columns in this table CONSTRAINT country_pkey PRIMARY KEY (oid) ) INHERITS (g.standard) WITH ( OIDS=FALSE ); A typical entry in t."order".szxmladdress looks like
<address entityType="Person" country="DE"> <field name="Address.oid">ae0eb84f-9b8b-4fef-b87a-d6757bdfeaf9</field> <field name="Address.uaddrtyperef">0bbdb48c-21c7-429e-944e-59a4d9ace9d5</field> <field name="Address.szstreet">Hauptstraße</field> <!—some more xml tags here --> </address> No XML field in the order table exceeds 2kB.
Getting the 2 letter country code from the xml address by this function:
CREATE OR REPLACE FUNCTION g.getxmladdresscountryid(xaddr text) RETURNS uuid AS $BODY$BEGIN RETURN (SELECT oID FROM G.Country WHERE szIsoAlpha2 = array_to_string(xpath('/address/@country', xAddr::xml), '') ORDER BY lIsoCode LIMIT 1); END$BODY$ LANGUAGE plpgsql IMMUTABLE; I know that this function is not really IMMUTABLE but the country table is changing only every several years.
Created a functional index on the order table: CREATE INDEX order_getxmladdresscountryid_fidx ON t."order" USING btree (g.getxmladdresscountryid(szxmladdress)); Joining order and country table limiting to 10 rows uses the index:
explain analyse SELECT GCountry.szISOAlpha2, GCountry.szISOAlpha3, GCountry.szISONum3 from t.order torder left join G.Country GCountry ON G.GetXmlAddressCountryID(TOrder.szXmlAddress) = GCountry.oID limit 10 Gives: Limit (cost=0.56..8.45 rows=10 width=11) (actual time=0.644..4.764 rows=10 loops=1) -> Merge Right Join (cost=0.56..10670.45 rows=13517 width=11) (actual time=0.642..4.754 rows=10 loops=1) Merge Cond: (gcountry.oid = g.getxmladdresscountryid(torder.szxmladdress)) -> Index Scan using country_pkey on country gcountry (cost=0.27..38.05 rows=258 width=27) (actual time=0.025..0.067 rows=32 loops=1) -> Index Scan using order_getxmladdresscountryid_fidx on "order" torder (cost=0.29..7019.04 rows=13517 width=366) (actual time=0.020..0.058 rows=10 loops=1) Planning time: 0.603 ms Execution time: 4.898 ms But when I remove the “limit 10”, the index is no longer used:
Hash Left Join (cost=17.81..5397.46 rows=13517 width=11) (actual time=0.941..4721.372 rows=13510 loops=1) Hash Cond: (g.getxmladdresscountryid(torder.szxmladdress) = gcountry.oid) -> Seq Scan on "order" torder (cost=0.00..3504.17 rows=13517 width=366) (actual time=0.011..27.542 rows=13510 loops=1) -> Hash (cost=14.58..14.58 rows=258 width=27) (actual time=0.427..0.427 rows=258 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 23kB -> Seq Scan on country gcountry (cost=0.00..14.58 rows=258 width=27) (actual time=0.008..0.226 rows=258 loops=1) Planning time: 0.580 ms Execution time: 4728.602 ms Event a “limit 100” does not use the index any more. However, the result of 4728 ms is almost a linear increase from 10 rows to 13500 rows.
I played around with COST of the function between 1 and 20000 and with several options on postgresql.conf without luck. A sequential scan on the order table alone is carried out in 15 ms.
Thanks for any idea. Klaus |