JIT performance question

Tobias Gierke <tobias.gierke@xxxxxxxxxxxxxxxx> · Wed, 6 Mar 2019 18:16:08 +0100

Hi,

I was playing around with PG11.2 (i6700k with 16GB RAM, on Ubuntu 18.04, 
compiled from sources) and LLVM, trying a CPU-bound query that in my 
simple mind should benefit from JIT'ting but (almost) doesn't.

1.) Test table with 195 columns of type 'numeric':

CREATE TABLE test (data0 numeric,data1 numeric,data2 numeric,data3 
numeric,...,data192 numeric,data193 numeric,data194 numeric);

2.) bulk-loaded (via COPY) 2 mio. rows of randomly generated data into 
this table (and ran vacuum & analyze afterwards)

3.) Disable parallel workers to just measure JIT performance via 'set 
max_parallel_workers = 0'

4.) Execute query without JIT a couple of times to make sure table is in 
memory (I had iostat running in the background to verify that actually 
no disk access was taking place):

test=# explain (analyze,buffers) SELECT SUM(data0) AS data0,SUM(data1) 
AS data1,SUM(data2) AS data2,...,SUM(data193) AS data193,SUM(data194) AS 
data194 FROM test;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=815586.31..815586.32 rows=1 width=6240) 
(actual time=14304.058..14304.058 rows=1 loops=1)
   Buffers: shared hit=64 read=399936
   ->  Gather  (cost=815583.66..815583.87 rows=2 width=6240) (actual 
time=14303.925..14303.975 rows=1 loops=1)
         Workers Planned: 2
         Workers Launched: 0
         Buffers: shared hit=64 read=399936
         ->  Partial Aggregate  (cost=814583.66..814583.67 rows=1 
width=6240) (actual time=14302.966..14302.966 rows=1 loops=1)
               Buffers: shared hit=64 read=399936
               ->  Parallel Seq Scan on test (cost=0.00..408333.33 
rows=833333 width=1170) (actual time=0.017..810.513 rows=2000000 loops=1)
                     Buffers: shared hit=64 read=399936
 Planning Time: 4.707 ms
 Execution Time: 14305.380 ms

5.) Now I turned on the JIT and repeated the same query a couple of 
times. This is what I got

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=815586.31..815586.32 rows=1 width=6240) 
(actual time=15558.558..15558.558 rows=1 loops=1)
   Buffers: shared hit=128 read=399872
   ->  Gather  (cost=815583.66..815583.87 rows=2 width=6240) (actual 
time=15558.450..15558.499 rows=1 loops=1)
         Workers Planned: 2
         Workers Launched: 0
         Buffers: shared hit=128 read=399872
         ->  Partial Aggregate  (cost=814583.66..814583.67 rows=1 
width=6240) (actual time=15557.541..15557.541 rows=1 loops=1)
               Buffers: shared hit=128 read=399872
               ->  Parallel Seq Scan on test (cost=0.00..408333.33 
rows=833333 width=1170) (actual time=0.020..941.925 rows=2000000 loops=1)
                     Buffers: shared hit=128 read=399872
 Planning Time: 11.230 ms
 JIT:
   Functions: 6
   Options: Inlining true, Optimization true, Expressions true, 
Deforming true
   Timing: Generation 15.707 ms, Inlining 4.688 ms, Optimization 
652.021 ms, Emission 939.556 ms, Total 1611.973 ms
 Execution Time: 15576.516 ms
(16 rows)

So (ignoring the time for JIT'ting itself) this yields only ~2-3% 
performance increase... is this because my query is just too simple to 
actually benefit a lot, meaning the code path for the 'un-JIT' case is 
already fairly optimal ? Or does JIT'ting actually only have a large 
impact on the filter/WHERE part of the query but not so much on 
aggregation / tuple deforming ?

Thanks,
Tobias