On 2021-02-16 19:30:23 +0100, Paul van der Linden wrote: > I have 2 functions: > > CREATE FUNCTION foo(a text, b text, c text) RETURNS text AS > $func$ > DECLARE > retVal text; > BEGIN > SELECT > CASE > ... snip long list containing various tests on a,b and c > END INTO retval; > RETURN retVal; > END > $func$ LANGUAGE PLPGSQL IMMUTABLE; > > CREATE FUNCTION bar(r text, geom geometry) RETURNS int AS > $func$ > DECLARE > retVal int; > BEGIN > SELECT > CASE > WHEN r='r1' AND st_area(geom)>100 THEN 1 > ... snip long list containing various tests on r and st_area(geom) > END INTO retval; > RETURN retVal; > END > $func$ LANGUAGE PLPGSQL IMMUTABLE; > > and a large table t (100M+ records) with columns a, b, c and geom running on PG > 11, on spinning disks with 64GB memory and 28 cores. > > When I create a simple geom index with CREATE INDEX ON t USING gist(geom) it > finishes in about an hour, but when I create a partial index using these 2 > functions > CREATE INDEX ON t USING gist(geom) WHERE bar(foo(a,b,c),geom)<12 it takes over > 20 hours... It has to call these functions for each of those 100M+ lines. So that's about 0.7 ms per line. Is that to be expected for what those functions do? I don't know. As depesz wrote, use smaller test case, like 1000 or 10000 rows. You can test the speed of those functions in isolation. Compare select a, b, c from t; select foo(a, b, c) from t; select bar(foo(a, b, c), geom) from t; That tells you how much time is spent in foo and bar. If the sum is close to those 0.7 ms, you know that you need to make those functions faster. I think SQL doesn't short-circuit, so in your function bar, that st_area(geom) in the first WHEN clause will always be called, regardless of the value of r. You can either call that once (as depesz suggested) or you can avoid calling it by nesting the cases: case when r = 'r1' then case when st_area(geom) > 100 then 1 else 2 end when r = 'r2' then .... end You can then go one step furthe: case when r like 'r_' then -- handle r1 ... r9 when r like 'r1_' then -- handle r10 ... r19 when r like 'r2_' then -- handle r20 ... r29 end That reduces the average number of comparisons from 50+ to 8. Depesz measured an overhead of 0.016ms per log message. That's low enough that you can probably afford a few messages, even if each function invocation only takes a few milliseconds. So definitely try that if you need to know where your functions spend their time. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@xxxxxx | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment:
signature.asc
Description: PGP signature