Re: 60 core performance with 9.3

Andres Freund <andres@xxxxxxxxxxxxxxx> · Fri, 11 Jul 2014 10:22:07 +0200

On 2014-07-11 12:40:15 +1200, Mark Kirkwood wrote:
> On 01/07/14 22:13, Andres Freund wrote:
> >On 2014-07-01 21:48:35 +1200, Mark Kirkwood wrote:
> >>- cherry picking the last 5 commits into 9.4 branch and building a package
> >>from that and retesting:
> >>
> >>Clients | 9.4 tps 60 cores (rwlock)
> >>--------+--------------------------
> >>6       |  70189
> >>12      | 128894
> >>24      | 233542
> >>48      | 422754
> >>96      | 590796
> >>192     | 630672
> >>
> >>Wow - that is more like it! Andres that is some nice work, we definitely owe
> >>you some beers for that :-) I am aware that I need to retest with an
> >>unpatched 9.4 src - as it is not clear from this data how much is due to
> >>Andres's patches and how much to the steady stream of 9.4 development. I'll
> >>post an update on that later, but figured this was interesting enough to
> >>note for now.
> >
> >Cool. That's what I like (and expect) to see :). I don't think unpatched
> >9.4 will show significantly different results than 9.3, but it'd be good
> >to validate that. If you do so, could you post the results in the
> >-hackers thread I just CCed you on? That'll help the work to get into
> >9.5.
> 
> So we seem to have nailed read only performance. Going back and revisiting
> read write performance finds:
> 
> Postgres 9.4 beta
> rwlock patch
> pgbench scale = 2000
> 
> max_connections = 200;
> shared_buffers = "10GB";
> maintenance_work_mem = "1GB";
> effective_io_concurrency = 10;
> wal_buffers = "32MB";
> checkpoint_segments = 192;
> checkpoint_completion_target = 0.8;
> 
> clients  | tps (32 cores) | tps
> ---------+----------------+---------
> 6        |   8313         |   8175
> 12       |  11012         |  14409
> 24       |  16151         |  17191
> 48       |  21153         |  23122
> 96       |  21977         |  22308
> 192      |  22917         |  23109

On that scale - that's bigger than shared_buffers IIRC - I'd not expect
the patch to make much of a difference.

> kernel.sched_autogroup_enabled=0
> kernel.sched_migration_cost_ns=5000000
> net.core.somaxconn=1024
> /sys/kernel/mm/transparent_hugepage/enabled [never]
> 
> Full report http://paste.ubuntu.com/7777886/

> #
>      8.82%        postgres  [kernel.kallsyms]        [k]
> _raw_spin_lock_irqsave
>                   |
>                   --- _raw_spin_lock_irqsave
>                      |
>                      |--75.69%-- pagevec_lru_move_fn
>                      |          __lru_cache_add
>                      |          lru_cache_add
>                      |          putback_lru_page
>                      |          migrate_pages
>                      |          migrate_misplaced_page
>                      |          do_numa_page
>                      |          handle_mm_fault
>                      |          __do_page_fault
>                      |          do_page_fault
>                      |          page_fault

So, the majority of the time is spent in numa page migration. Can you
disable numa_balancing? I'm not sure if your kernel version does that at
runtime or whether you need to reboot.
The kernel.numa_balancing sysctl might work. Otherwise you probably need
to boot with numa_balancing=0.

It'd also be worthwhile to test this with numactl --interleave.

Greetings,

Andres Freund

-- 
 Andres Freund	                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services