Re: [HACKERS] Parallel Select query performance and shared buffers

Metin Doslu <metin@xxxxxxxxxxxxx> · Wed, 4 Dec 2013 18:43:35 +0200

> I'd strongly suggest doing a "perf record -g -a <wait a bit, ctrl-c>;
> perf report" run to check what's eating up the time.

Here is one example:

+  38.87%   swapper  [kernel.kallsyms]   [k] hypercall_page
+   9.32%  postgres  [kernel.kallsyms]   [k] hypercall_page
+   6.80%  postgres  [kernel.kallsyms]   [k] xen_set_pte_at
+   5.83%  postgres  [kernel.kallsyms]   [k] copy_user_generic_string
+   2.06%  postgres  [kernel.kallsyms]   [k] file_read_actor
+   1.89%  postgres  postgres            [.] heapgettup_pagemode
+   1.83%  postgres  postgres            [.] hash_search_with_hash_value
+   1.33%  postgres  [kernel.kallsyms]   [k] get_phys_to_machine
+   1.25%  postgres  [kernel.kallsyms]   [k] find_get_page
+   1.00%  postgres  postgres            [.] heapgetpage
+   0.99%  postgres  [kernel.kallsyms]   [k] radix_tree_lookup_element
+   0.98%  postgres  postgres            [.] advance_aggregates
+   0.96%  postgres  postgres            [.] ExecProject
+   0.94%  postgres  postgres            [.] advance_transition_function
+   0.88%  postgres  postgres            [.] ExecScan
+   0.87%  postgres  postgres            [.] HeapTupleSatisfiesMVCC
+   0.86%  postgres  postgres            [.] LWLockAcquire
+   0.82%  postgres  [kernel.kallsyms]   [k] put_page
+   0.82%  postgres  postgres            [.] MemoryContextReset
+   0.80%  postgres  postgres            [.] SeqNext
+   0.78%  postgres  [kernel.kallsyms]   [k] pte_mfn_to_pfn
+   0.69%  postgres  postgres            [.] ExecClearTuple
+   0.57%  postgres  postgres            [.] ExecProcNode
+   0.54%  postgres  postgres            [.] heap_getnext
+   0.53%  postgres  postgres            [.] LWLockRelease
+   0.53%  postgres  postgres            [.] ExecStoreTuple
+   0.51%  postgres  libc-2.12.so        [.] __GI___libc_read
+   0.42%  postgres  [kernel.kallsyms]   [k] xen_spin_lock
+   0.40%  postgres  postgres            [.] ReadBuffer_common
+   0.38%  postgres  [kernel.kallsyms]   [k] __do_fault
+   0.37%  postgres  [kernel.kallsyms]   [k] shmem_fault
+   0.37%  postgres  [kernel.kallsyms]   [k] unmap_single_vma
+   0.35%  postgres  [kernel.kallsyms]   [k] __wake_up_bit
+   0.33%  postgres  postgres            [.] StrategyGetBuffer
+   0.33%  postgres  [kernel.kallsyms]   [k] set_page_dirty
+   0.33%  postgres  [kernel.kallsyms]   [k] handle_pte_fault
+   0.33%  postgres  postgres            [.] ExecAgg
+   0.31%  postgres  postgres            [.] XidInMVCCSnapshot
+   0.31%  postgres  [kernel.kallsyms]   [k] __audit_syscall_entry
+   0.31%  postgres  postgres            [.] CheckForSerializableConflictOut
+   0.29%  postgres  [kernel.kallsyms]   [k] handle_mm_fault
+   0.25%  postgres  [kernel.kallsyms]   [k] shmem_getpage_gfp

On Wed, Dec 4, 2013 at 6:33 PM, Andres Freund <andres@xxxxxxxxxxxxxxx> wrote:

On 2013-12-04 14:27:10 -0200, Claudio Freire wrote:

> On Wed, Dec 4, 2013 at 9:19 AM, Metin Doslu <metin@xxxxxxxxxxxxx> wrote:

> >

> > Here are the results of "vmstat 1" while running 8 parallel TPC-H Simple

> > (#6) queries:  Although there is no need for I/O, "wa" fluctuates between 0

> > and 1.

> >

> > procs -----------memory---------- ---swap-- -----io---- --system--

> > -----cpu-----

> >  r  b   swpd   free   buff    cache     si   so    bi    bo    in     cs us sy  id wa st

> >  0  0      0 30093568  84892 38723896    0    0     0     0    22     14  0  0 100  0  0

> >  8  1      0 30043056  84892 38723896    0    0     0     0 27080  52708 16 14  70  0  0

> >  8  1      0 30006600  84892 38723896    0    0     0     0 44952 118286 43 44  12  1  0

> >  8  0      0 29986264  84900 38723896    0    0     0    20 28043  95934 49 42   8  1  0

> >  7  0      0 29991976  84900 38723896    0    0     0     0  8308  73641 52 42   6  0  0

> >  0  0      0 30091828  84900 38723896    0    0     0     0  3996  30978 23 24  53  0  0

> >  0  0      0 30091968  84900 38723896    0    0     0     0    17    23   0  0 100  0  0

>

>

> Notice the huge %sy

My bet is on transparent hugepage defragmentation. Alternatively it's

scheduler overhead, due to superflous context switches around the buffer

mapping locks.

I'd strongly suggest doing a "perf record -g -a <wait a bit, ctrl-c>;

perf report" run to check what's eating up the time.

Greetings,

Andres Freund

--

 Andres Freund                     http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services