Re: [HACKERS] Parallel Select query performance and shared buffers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>   I think all of this data cannot fit in shared_buffers, you might want to increase shared_buffers
>   to larger size (not 30GB but close to your data size) to see how it behaves.

When I use shared_buffers larger than my data size such as 10 GB, results scale nearly as expected at least for this instance type.

> You queries have Aggregation, ORDER/GROUP BY, so there is a chance
> that I/O can happen for those operation's
> if PG doesn't have sufficient memory (work_mem) to perform such operation.

I used work_mem as 32 MB, this should be enough for these queries. I also tested with higher values of work_mem, and didn't obverse any difference.

> Can you simplify your queries (simple scan or in other words no
> aggregation or other things) to see how
> they behave in your env., once you are able to see simple queries
> scaling as per your expectation, you
> can try with complex one's.

Actually we observe problem when queries start to get simpler such as select count(*). Here is the results table in more compact format:

                  select count(*) TPC-H Simple(#6) TPC-H Complex(#1)
1 Table / 1 query      1.5 s            2.5 s           8.4 s
2 Tables/ 2 queries    1.5 s            2.5 s           8.4 s
4 Tables/ 4 queries    2.0 s            2.9 s           8.8 s
8 Tables/ 8 queries    3.3 s            4.0 s           9.6 s

> Can we have the explain analyze of those queries, postgres
> configuration, perhaps vmstat output during execution?

postgres=# explain analyze SELECT count(*) from lineitem_1;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=199645.01..199645.02 rows=1 width=0) (actual time=11317.391..11317.393 rows=1 loops=1)
   ->  Seq Scan on lineitem_1  (cost=0.00..184641.81 rows=6001281 width=0) (actual time=0.011..5805.255 rows=6001215 loops=1)
 Total runtime: 11317.440 ms
(3 rows)

postgres=# explain analyze SELECT
postgres-#     sum(l_extendedprice * l_discount) as revenue
postgres-# FROM
postgres-#     lineitem_1
postgres-# WHERE
postgres-#     l_shipdate >= date '1994-01-01'
postgres-#     AND l_shipdate < date '1994-01-01' + interval '1' year
postgres-#     AND l_discount between 0.06 - 0.01 AND 0.06 + 0.01
postgres-#     AND l_quantity < 24;
                                                 QUERY PLAN                                                                           
------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=260215.36..260215.37 rows=1 width=16) (actual time=1751.775..1751.776 rows=1 loops=1)
   ->  Seq Scan on lineitem_1  (cost=0.00..259657.82 rows=111508 width=16) (actual time=0.031..1630.449 rows=114160 loops=1)
         Filter: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone) AND (l_discount >= 0.05::double precision) AND (l_discount <= 0.07::double precision) AND
 (l_quantity < 24::double precision))
         Rows Removed by Filter: 5887055
 Total runtime: 1751.830 ms
(5 rows)

postgres=# explain analyze SELECT
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
FROM
    lineitem_1
WHERE
    l_shipdate <= date '1998-12-01' - interval '90' day
GROUP BY
    l_returnflag,
    l_linestatus
ORDER BY
    l_returnflag,
    l_linestatus;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=436342.68..436342.69 rows=6 width=36) (actual time=18720.932..18720.936 rows=4 loops=1)
   Sort Key: l_returnflag, l_linestatus
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=436342.49..436342.60 rows=6 width=36) (actual time=18720.887..18720.892 rows=4 loops=1)
         ->  Seq Scan on lineitem_1  (cost=0.00..199645.01 rows=5917437 width=36) (actual time=0.011..6754.619 rows=5916591 loops=1)
               Filter: (l_shipdate <= '1998-09-02 00:00:00'::timestamp without time zone)
               Rows Removed by Filter: 84624
 Total runtime: 18721.021 ms
(8 rows)


Here are the results of "vmstat 1" while running 8 parallel TPC-H Simple (#6) queries:  Although there is no need for I/O, "wa" fluctuates between 0 and 1. 

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 30093568  84892 38723896    0    0     0     0   22   14  0  0 100  0  0
 8  1      0 30043056  84892 38723896    0    0     0     0 27080 52708 16 14 70  0  0
 8  1      0 30006600  84892 38723896    0    0     0     0 44952 118286 43 44 12  1  0
 8  0      0 29986264  84900 38723896    0    0     0    20 28043 95934 49 42  8  1  0
 7  0      0 29991976  84900 38723896    0    0     0     0 8308 73641 52 42  6  0  0
 0  0      0 30091828  84900 38723896    0    0     0     0 3996 30978 23 24 53  0  0
 0  0      0 30091968  84900 38723896    0    0     0     0   17   23  0  0 100  0  0

I installed PostgreSQL 9.3.1 from source and in postgres configuration file I only changed shared buffers (4 GB) and work_mem (32 MB).

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux