Re: Very poor read performance, query independent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Charles Nadeau [mailto:charles.nadeau@xxxxxxxxx]
Sent: Friday, July 14, 2017 11:35 AM
To: Igor Neyman <ineyman@xxxxxxxxxxxxxx>
Cc: Jeff Janes <jeff.janes@xxxxxxxxx>; pgsql-performance@xxxxxxxxxxxxxx
Subject: Re: [PERFORM] Very poor read performance, query independent

 

Igor,

 

Initially temp_buffer was left to its default value (8MB). Watching the content of the directory that stores the temporary files, I found that I need at most 21GB of temporary files space. Should I set temp_buffer to 21GB?

Here is the explain you requested with work_mem set to 6GB:

flows=# set work_mem='6GB';

SET

flows=# explain (analyze, buffers) SELECT DISTINCT

   srcaddr,

   dstaddr,

   dstport,

   COUNT(*) AS conversation,

   SUM(doctets) / 1024 / 1024 AS mbytes

FROM

   flowscompact,

   mynetworks

WHERE

   mynetworks.ipaddr >>= flowscompact.srcaddr

   AND dstaddr IN

   (

      SELECT

         dstaddr

      FROM

         dstexterne

   )

GROUP BY

   srcaddr,

   dstaddr,

   dstport

ORDER BY

   mbytes DESC LIMIT 50;

                                                                                           QUERY PLAN                                                                                           

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual time=2227678.196..2227678.223 rows=50 loops=1)

   Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154

   ->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52) (actual time=2227678.194..2227678.217 rows=50 loops=1)

         Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154

         ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52) (actual time=2227678.192..2227678.202 rows=50 loops=1)

               Sort Key: (((sum(flows.doctets) / '1024'::numeric) / '1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))

               Sort Method: quicksort  Memory: 654395kB

               Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154

               ->  GroupAggregate  (cost=48059426.65..48079260.50 rows=2644514 width=52) (actual time=2167909.030..2211446.192 rows=5859671 loops=1)

                     Group Key: flows.srcaddr, flows.dstaddr, flows.dstport

                     Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154

                     ->  Sort  (cost=48059426.65..48060748.90 rows=2644514 width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)

                           Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport

                           Sort Method: external merge  Disk: 3049216kB

                           Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154

                           ->  Gather  (cost=30060688.07..48003007.07 rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640 loops=1)

                                 Workers Planned: 12

                                 Workers Launched: 12

                                 Buffers: shared hit=728798037 read=82974833

                                 ->  Hash Semi Join  (cost=30059688.07..47951761.31 rows=220376 width=20) (actual time=1268845.181..2007864.725 rows=7057357 loops=13)

                                       Hash Cond: (flows.dstaddr = flows_1.dstaddr)

                                       Buffers: shared hit=728795193 read=82974833

                                       ->  Nested Loop  (cost=0.03..17891246.86 rows=220376 width=20) (actual time=0.207..723790.283 rows=37910370 loops=13)

                                             Buffers: shared hit=590692229 read=14991777

                                             ->  Parallel Seq Scan on flows  (cost=0.00..16018049.14 rows=55094048 width=20) (actual time=0.152..566179.117 rows=45371630 loops=13)

                                                   Buffers: shared hit=860990 read=14991777

                                             ->  Index Only Scan using mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=589831190)

                                                   Index Cond: (ipaddr >>= (flows.srcaddr)::ip4r)

                                                   Heap Fetches: 0

                                                   Buffers: shared hit=589831203

                                       ->  Hash  (cost=30059641.47..30059641.47 rows=13305 width=4) (actual time=1268811.101..1268811.101 rows=3803508 loops=13)

                                             Buckets: 4194304 (originally 16384)  Batches: 1 (originally 1)  Memory Usage: 166486kB

                                             Buffers: shared hit=138102964 read=67983056

                                             ->  HashAggregate  (cost=30059561.64..30059601.56 rows=13305 width=4) (actual time=1265248.165..1267432.083 rows=3803508 loops=13)

                                                   Group Key: flows_1.dstaddr

                                                   Buffers: shared hit=138102964 read=67983056

                                                   ->  Nested Loop Anti Join  (cost=0.00..29729327.92 rows=660467447 width=4) (actual time=0.389..1201072.707 rows=125838232 loops=13)

                                                         Join Filter: (mynetworks_1.ipaddr >> (flows_1.dstaddr)::ip4r)

                                                         Rows Removed by Join Filter: 503353617

                                                         Buffers: shared hit=138102964 read=67983056

                                                         ->  Seq Scan on flows flows_1  (cost=0.00..17836152.73 rows=661128576 width=4) (actual time=0.322..343152.274 rows=589831190 loops=13)

                                                               Buffers: shared hit=138102915 read=67983056

                                                         ->  Materialize  (cost=0.00..1.02 rows=4 width=8) (actual time=0.000..0.000 rows=2 loops=7667805470)

                                                               Buffers: shared hit=13

                                                               ->  Seq Scan on mynetworks mynetworks_1  (cost=0.00..1.01 rows=4 width=8) (actual time=0.006..0.007 rows=4 loops=13)

                                                                     Buffers: shared hit=13

 Planning time: 0.941 ms

 Execution time: 2228345.171 ms

(48 rows)

 

With a work_mem at 6GB, I noticed that for the first 20 minutes the query was running, the i/o wait was much lower, hovering aroun 3% then it jumped 45% until almost the end of the query. 

 

flowscompact and dstexterne are actually views. I use views to simplify query writing and to "abstract" queries that are use often in other queries. flowscompact is a view built on table flows (having about 590 million rows), it only keeps the most often used fields.

flows=# \d+ flowscompact;

                        View "public.flowscompact"

  Column   |           Type           | Modifiers | Storage | Description 

-----------+--------------------------+-----------+---------+-------------

 flow_id   | bigint                   |           | plain   | 

 sysuptime | bigint                   |           | plain   | 

 exaddr    | ip4                      |           | plain   | 

 dpkts     | integer                  |           | plain   | 

 doctets   | bigint                   |           | plain   | 

 first     | bigint                   |           | plain   | 

 last      | bigint                   |           | plain   | 

 srcaddr   | ip4                      |           | plain   | 

 dstaddr   | ip4                      |           | plain   | 

 srcport   | integer                  |           | plain   | 

 dstport   | integer                  |           | plain   | 

 prot      | smallint                 |           | plain   | 

 tos       | smallint                 |           | plain   | 

 tcp_flags | smallint                 |           | plain   | 

 timestamp | timestamp with time zone |           | plain   | 

View definition:

 SELECT flowstimestamp.flow_id,

    flowstimestamp.sysuptime,

    flowstimestamp.exaddr,

    flowstimestamp.dpkts,

    flowstimestamp.doctets,

    flowstimestamp.first,

    flowstimestamp.last,

    flowstimestamp.srcaddr,

    flowstimestamp.dstaddr,

    flowstimestamp.srcport,

    flowstimestamp.dstport,

    flowstimestamp.prot,

    flowstimestamp.tos,

    flowstimestamp.tcp_flags,

    flowstimestamp."timestamp"

   FROM flowstimestamp;

mynetworks is a table having one column and 4 rows; it contains a list of our network networks:

flows=# select * from mynetworks;

     ipaddr     

----------------

(4 row)

flows=# \d+ mynetworks;

                    Table "public.mynetworks"

 Column | Type | Modifiers | Storage | Stats target | Description 

--------+------+-----------+---------+--------------+-------------

 ipaddr | ip4r |           | plain   |              | 

Indexes:

    "mynetworks_ipaddr_idx" gist (ipaddr)

dstexterne is a view listing all the destination IPv4 addresses not inside our network; it has one column and 3.8 million rows.

flows=# \d+ dstexterne;

              View "public.dstexterne"

 Column  | Type | Modifiers | Storage | Description 

---------+------+-----------+---------+-------------

 dstaddr | ip4  |           | plain   | 

View definition:

 SELECT DISTINCT flowscompact.dstaddr

   FROM flowscompact

     LEFT JOIN mynetworks ON mynetworks.ipaddr >> flowscompact.dstaddr::ip4r

  WHERE mynetworks.ipaddr IS NULL;

Thanks!

 

Charles

 

Charles,

 

Don’t change temp_buffers.

Try to increase work_mem even more, say work_mem=’12GB’, because it’s still using disk for sorting (starting around 20th minute as you noticed).

See if this:

“Sort Method: external merge  Disk: 3049216kB”

goes away.

Igor

 


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux