Re: Performance difference in accessing differrent columns in a Postgres Table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 2018-07-31 12:56:26 -0400, Jeff Janes wrote:
> On Mon, Jul 30, 2018 at 1:23 PM, Andres Freund <andres@xxxxxxxxxxx> wrote:
> 
> > On 2018-07-30 07:19:07 -0400, Jeff Janes wrote:
> >
> > > And indeed, in my hands JIT makes it almost 3 times worse.
> >
> > Not in my measurement. Your example won't use JIT at all, because it's
> > below the cost threshold. So I think you might just be seeing cache +
> > hint bit effects?
> >
> 
> No, it is definitely JIT.  The explain plans show it, and the cost of the
> query is 230,000 while the default setting of jit_above_cost is 100,000.
> It is fully reproducible by repeatedly toggling the JIT setting.  It
> doesn't seem to be the cost of compiling the code that slows it down (I'm
> assuming the code is compiled once per tuple descriptor, not once per
> tuple), but rather the efficiency of the compiled code.

Interesting. I see a smaller benefit without opt, but still one. I guess
that depends on code emission.


> > > Run against ab87b8fedce3fa77ca0d6, I get 12669.619 ms for the 2nd JIT
> > > execution and 4594.994 ms for the JIT=off.
> >
> > Even with a debug LLVM build, which greatly increases compilation
> > overhead, I actually see quite the benefit when I force JIT to be used:
> >
> 
> I don't see a change when I compile without --enable-debug,
> and jit_debugging_support is off, or in 11beta2 nonexistent.  How can I
> know if I have a debug LLVM build, and turn it off if I do?

llvm-config --assertion-mode should tell you.


> > postgres[26832][1]=# ;SET jit_above_cost = -1; set jit_optimize_above_cost
> > = 0; set jit_inline_above_cost = 0;
> > postgres[26832][1]=# explain (analyze, buffers, timing off) select pk,
> > int200 from i200c200;
> >
> 
> Lowering jit_optimize_above_cost does redeem this for me.  It brings it
> back to being a tie with JIT=OFF.  I don't see any further improvement by
> lowering jit_inline_above_cost, and overall it is just a statistical tie
> with JIT=off, not an improvement as you get, but at least it isn't a
> substantial loss.

Interesting, as posted, I do see quite measurable improvements. What's
your version of LLVM?


> Under what conditions would I want to do jit without doing optimizations on
> it?  Is there a rule of thumb that could be documented, or do we just use
> the experimental method for each query?

I don't think we quite know yet. Optimization for larger queries can
take a while. For expression heavy queries there's a window where JITing
can help, but optimization can be beneficial.


> I had previously done a poor-man's JIT where I created 4 versions of the
> main 'for' loop in slot_deform_tuple.  I did a branch on "if(hasnulls)",
> and then each branch had two loops, one for when 'slow' is false, and then
> one for after 'slow' becomes true so we don't have to keep setting it true
> again once it already is, in a tight loop.  I didn't see noticeable
> improvement there (although perhaps I would have on different hardware), so
> didn't see how JIT could help with this almost-entirely-null case.  I'm not
> trying to address JIT in general, just as it applies to this particular
> case.

I don't see how it follows from that observation that JITing can't be
beneficial? The bitmap access alone can be optimized if you unroll the
loop (as now the offsets into it are constant). The offset computations
into tts_values/isnull aren't dynamic anymore. The loop counter is
gone. And nearly all tuple have hasnulls set, so specializing for that
case isn't going to get you that much, it's perfectly predictable.


> Unrelated to JIT and relevant to the 'select pk, int199' case but not the
> 'select pk, int200' case, it seems we have gone to some length to make slot
> deforming be efficient for incremental use, but then just deform in bulk
> anyway up to maximum attnum used in the query, at least in this case.  Is
> that because incremental deforming is not cache efficient?

Well, that's not *quite* how it works: We always deform up to the point
used in a certain "level" of the query. E.g. if a select's where clause
needs something up to attribute 3, the seqscan might deform only up to
there, even though an aggregate ontop of that might need up to 10.  But
yes, you're right, uselessly incrementally deforming isn't cache
efficient.

I think before long we're going to have to change the slot mechanism so
we don't deform columns we don't actually need. I.e we'll need something
like a bitmap of needed columns and skip over unneeded ones. When not
JITed that'll allow us to skip copying such columns (removing an 8 and 1
byte write), when JITing we can do better, and e.g. entirely skip
processing fixed width NOT MULL columns that aren't needed.

Greetings,

Andres Freund




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux