Re: Re: Increasing work_mem and shared_buffers on Postgres 9.2 significantly slows down queries

Petr Praus <petr@xxxxxxxxx> · Tue, 6 Nov 2012 14:08:48 -0600

On 6 November 2012 13:38, Gunnar "Nick" Bluth <gunnar.bluth@xxxxxxxxxxx> wrote:

    Am 06.11.2012 18:38, schrieb Petr
      Praus:

        Yes, but note that this happens only in Linux. Increasing
          work_mem on my iMac increases performance (but the queries are
          slower under OSX than on virtualized Ubuntu on the same
          machine). Over the weekend, I tried the same test on my Ubuntu
          home machine with Ivy Bridge i5 3570K and it also slows down
          (from ~900ms with work_mem=1MB to ~1200ms with work_mem=96MB).

    OS X is rather different from a memory access point of view, IIRC.
    So the direct comparison actually only shows how well the Linux FS
    cache works (for the temp files created with small work_mem ;-).

    The i5 puzzles me a bit though...

            I'm pretty sure you're hitting some subtle,
            memory-access-related cornercase here.

            The L2 cache of your X7350 CPUs is 2MB, could you run the
            tests with, say, 1, 2, 4 and 8MB of work_mem and post the
            results?

        I made a pgbench test with the same query and run it 25
          times (5 clients, 5 transactions each):
        work_mem   speed
        1MB        1794ms
        2MB        1877ms
        4MB        2084ms
        8MB        2141ms
        10MB       2124ms
        12MB       3018ms
        16MB       3004ms
        32MB       2999ms
        64MB       3015ms

        It seems that there is some sort of "plateau".

    Two, afaics. The 1->2 change hints towards occasionally breaching
    your L2 cache, so it can probably be ignored. The actual plateaus
    thus seem to be 0-2, 2-12, >= 12.

    It'd be interesting to see the EXPLAIN ANALYSE outputs for these
    levels, the buckets and batches in particular. I'd reckon we'll see
    significant changes at 2->4 and 10->12MB work_mem.

Here are the explains, I run the query a few times before actually taking the explain to warm up the caches. (I also noticed that explain slows down the query execution which is probably to be expected.)

2MB: http://explain.depesz.com/s/ul1
4MB: http://explain.depesz.com/s/IlVu
10MB: http://explain.depesz.com/s/afx3

12MB: http://explain.depesz.com/s/i0vQ

      So, to sum this up (and make
        someone more competent bite on it maybe ;-), on your SMP, FSB,
        "fake-multicore" system all "hash"-related works that
        potentially switch to different implementations internally (but
        w/out telling us so) when given more work_mem are slower.

    See other post... it actually does tell us (# of buckets/batches).
    However, the result is not good and could potentially be improved be
    twealing the statistic_targets of the joined tables/columns.

    I wonder why noone actually understanding the implementation chipped
    in yet... Andres, Greg, Tom, whoever actually understands what's
    happening here, anyone reading this? ;-)

    Cheers,

    -- 
Gunnar "Nick" Bluth
RHCE/SCLA

Mobil   +49 172 8853339
Email: gunnar.bluth@xxxxxxxxxxx
__________________________________________________________________________
In 1984 mainstream users were choosing VMS over UNIX.  Ten years later
they are choosing Windows over UNIX.  What part of that message aren't you
getting? - Tom Payne