Hi,
I was playing around with PG11.2 (i6700k with 16GB RAM, on Ubuntu 18.04,
compiled from sources) and LLVM, trying a CPU-bound query that in my
simple mind should benefit from JIT'ting but (almost) doesn't.
1.) Test table with 195 columns of type 'numeric':
CREATE TABLE test (data0 numeric,data1 numeric,data2 numeric,data3
numeric,...,data192 numeric,data193 numeric,data194 numeric);
2.) bulk-loaded (via COPY) 2 mio. rows of randomly generated data into
this table (and ran vacuum & analyze afterwards)
3.) Disable parallel workers to just measure JIT performance via 'set
max_parallel_workers = 0'
4.) Execute query without JIT a couple of times to make sure table is in
memory (I had iostat running in the background to verify that actually
no disk access was taking place):
test=# explain (analyze,buffers) SELECT SUM(data0) AS data0,SUM(data1)
AS data1,SUM(data2) AS data2,...,SUM(data193) AS data193,SUM(data194) AS
data194 FROM test;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=815586.31..815586.32 rows=1 width=6240)
(actual time=14304.058..14304.058 rows=1 loops=1)
Buffers: shared hit=64 read=399936
-> Gather (cost=815583.66..815583.87 rows=2 width=6240) (actual
time=14303.925..14303.975 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 0
Buffers: shared hit=64 read=399936
-> Partial Aggregate (cost=814583.66..814583.67 rows=1
width=6240) (actual time=14302.966..14302.966 rows=1 loops=1)
Buffers: shared hit=64 read=399936
-> Parallel Seq Scan on test (cost=0.00..408333.33
rows=833333 width=1170) (actual time=0.017..810.513 rows=2000000 loops=1)
Buffers: shared hit=64 read=399936
Planning Time: 4.707 ms
Execution Time: 14305.380 ms
5.) Now I turned on the JIT and repeated the same query a couple of
times. This is what I got
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=815586.31..815586.32 rows=1 width=6240)
(actual time=15558.558..15558.558 rows=1 loops=1)
Buffers: shared hit=128 read=399872
-> Gather (cost=815583.66..815583.87 rows=2 width=6240) (actual
time=15558.450..15558.499 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 0
Buffers: shared hit=128 read=399872
-> Partial Aggregate (cost=814583.66..814583.67 rows=1
width=6240) (actual time=15557.541..15557.541 rows=1 loops=1)
Buffers: shared hit=128 read=399872
-> Parallel Seq Scan on test (cost=0.00..408333.33
rows=833333 width=1170) (actual time=0.020..941.925 rows=2000000 loops=1)
Buffers: shared hit=128 read=399872
Planning Time: 11.230 ms
JIT:
Functions: 6
Options: Inlining true, Optimization true, Expressions true,
Deforming true
Timing: Generation 15.707 ms, Inlining 4.688 ms, Optimization
652.021 ms, Emission 939.556 ms, Total 1611.973 ms
Execution Time: 15576.516 ms
(16 rows)
So (ignoring the time for JIT'ting itself) this yields only ~2-3%
performance increase... is this because my query is just too simple to
actually benefit a lot, meaning the code path for the 'un-JIT' case is
already fairly optimal ? Or does JIT'ting actually only have a large
impact on the filter/WHERE part of the query but not so much on
aggregation / tuple deforming ?
Thanks,
Tobias