From: pgsql-performance-owner@
postgresql.org [mailto:pgsql-performance-owner@xxxxxxxxxxxxxx ] On Behalf Of Igor Neyman
Sent: Friday, July 14, 2017 3:13 PM
To: Charles Nadeau <charles.nadeau@xxxxxxxxx>
Cc: Jeff Janes <jeff.janes@xxxxxxxxx>; pgsql-performance@postgresql.org
Subject: Re: Very poor read performance, query independent
From: Charles Nadeau [mailto:charles.nadeau@gmail.
com ]
Sent: Friday, July 14, 2017 11:35 AM
To: Igor Neyman <ineyman@xxxxxxxxxxxxxx>
Cc: Jeff Janes <jeff.janes@xxxxxxxxx>; pgsql-performance@postgresql.org
Subject: Re: Very poor read performance, query independent
Igor,
Initially temp_buffer was left to its default value (8MB). Watching the content of the directory that stores the temporary files, I found that I need at most 21GB of temporary files space. Should I set temp_buffer to 21GB?
Here is the explain you requested with work_mem set to 6GB:
flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
srcaddr,
dstaddr,
dstport,
COUNT(*) AS conversation,
SUM(doctets) / 1024 / 1024 AS mbytes
FROM
flowscompact,
mynetworks
WHERE
mynetworks.ipaddr >>= flowscompact.srcaddr
AND dstaddr IN
(
SELECT
dstaddr
FROM
dstexterne
)
GROUP BY
srcaddr,
dstaddr,
dstport
ORDER BY
mbytes DESC LIMIT 50;
QUERY PLAN
------------------------------
------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------ Limit (cost=48135680.07..48135680.
22 rows=50 width=52) (actual time=2227678.196..2227678.223 rows=50 loops=1) Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
-> Unique (cost=48135680.07..48143613.
62 rows=2644514 width=52) (actual time=2227678.194..2227678.217 rows=50 loops=1) Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
-> Sort (cost=48135680.07..48137002.
33 rows=2644514 width=52) (actual time=2227678.192..2227678.202 rows=50 loops=1) Sort Key: (((sum(flows.doctets) / '1024'::numeric) / '1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
Sort Method: quicksort Memory: 654395kB
Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
-> GroupAggregate (cost=48059426.65..48079260.
50 rows=2644514 width=52) (actual time=2167909.030..2211446.192 rows=5859671 loops=1) Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
-> Sort (cost=48059426.65..48060748.
90 rows=2644514 width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1) Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
Sort Method: external merge Disk: 3049216kB
Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
-> Gather (cost=30060688.07..48003007.
07 rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640 loops=1) Workers Planned: 12
Workers Launched: 12
Buffers: shared hit=728798037 read=82974833
-> Hash Semi Join (cost=30059688.07..47951761.
31 rows=220376 width=20) (actual time=1268845.181..2007864.725 rows=7057357 loops=13) Hash Cond: (flows.dstaddr = flows_1.dstaddr)
Buffers: shared hit=728795193 read=82974833
-> Nested Loop (cost=0.03..17891246.86 rows=220376 width=20) (actual time=0.207..723790.283 rows=37910370 loops=13)
Buffers: shared hit=590692229 read=14991777
-> Parallel Seq Scan on flows (cost=0.00..16018049.14 rows=55094048 width=20) (actual time=0.152..566179.117 rows=45371630 loops=13)
Buffers: shared hit=860990 read=14991777
-> Index Only Scan using mynetworks_ipaddr_idx on mynetworks (cost=0.03..0.03 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=589831190)
Index Cond: (ipaddr >>= (flows.srcaddr)::ip4r)
Heap Fetches: 0
Buffers: shared hit=589831203
-> Hash (cost=30059641.47..30059641.
47 rows=13305 width=4) (actual time=1268811.101..1268811.101 rows=3803508 loops=13) Buckets: 4194304 (originally 16384) Batches: 1 (originally 1) Memory Usage: 166486kB
Buffers: shared hit=138102964 read=67983056
-> HashAggregate (cost=30059561.64..30059601.
56 rows=13305 width=4) (actual time=1265248.165..1267432.083 rows=3803508 loops=13) Group Key: flows_1.dstaddr
Buffers: shared hit=138102964 read=67983056
-> Nested Loop Anti Join (cost=0.00..29729327.92 rows=660467447 width=4) (actual time=0.389..1201072.707 rows=125838232 loops=13)
Join Filter: (mynetworks_1.ipaddr >> (flows_1.dstaddr)::ip4r)
Rows Removed by Join Filter: 503353617
Buffers: shared hit=138102964 read=67983056
-> Seq Scan on flows flows_1 (cost=0.00..17836152.73 rows=661128576 width=4) (actual time=0.322..343152.274 rows=589831190 loops=13)
Buffers: shared hit=138102915 read=67983056
-> Materialize (cost=0.00..1.02 rows=4 width=8) (actual time=0.000..0.000 rows=2 loops=7667805470)
Buffers: shared hit=13
-> Seq Scan on mynetworks mynetworks_1 (cost=0.00..1.01 rows=4 width=8) (actual time=0.006..0.007 rows=4 loops=13)
Buffers: shared hit=13
Planning time: 0.941 ms
Execution time: 2228345.171 ms
(48 rows)
With a work_mem at 6GB, I noticed that for the first 20 minutes the query was running, the i/o wait was much lower, hovering aroun 3% then it jumped 45% until almost the end of the query.
flowscompact and dstexterne are actually views. I use views to simplify query writing and to "abstract" queries that are use often in other queries. flowscompact is a view built on table flows (having about 590 million rows), it only keeps the most often used fields.
flows=# \d+ flowscompact;
View "public.flowscompact"
Column | Type | Modifiers | Storage | Description
-----------+------------------
--------+-----------+--------- +------------- flow_id | bigint | | plain |
sysuptime | bigint | | plain |
exaddr | ip4 | | plain |
dpkts | integer | | plain |
doctets | bigint | | plain |
first | bigint | | plain |
last | bigint | | plain |
srcaddr | ip4 | | plain |
dstaddr | ip4 | | plain |
srcport | integer | | plain |
dstport | integer | | plain |
prot | smallint | | plain |
tos | smallint | | plain |
tcp_flags | smallint | | plain |
timestamp | timestamp with time zone | | plain |
View definition:
SELECT flowstimestamp.flow_id,
flowstimestamp.sysuptime,
flowstimestamp.exaddr,
flowstimestamp.dpkts,
flowstimestamp.doctets,
flowstimestamp.first,
flowstimestamp.last,
flowstimestamp.srcaddr,
flowstimestamp.dstaddr,
flowstimestamp.srcport,
flowstimestamp.dstport,
flowstimestamp.prot,
flowstimestamp.tos,
flowstimestamp.tcp_flags,
flowstimestamp."timestamp"
FROM flowstimestamp;
mynetworks is a table having one column and 4 rows; it contains a list of our network networks:
flows=# select * from mynetworks;
ipaddr
----------------
(4 row)
flows=# \d+ mynetworks;
Table "public.mynetworks"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------+-----------+--
-------+--------------+------- ------ ipaddr | ip4r | | plain | |
Indexes:
"mynetworks_ipaddr_idx" gist (ipaddr)
dstexterne is a view listing all the destination IPv4 addresses not inside our network; it has one column and 3.8 million rows.
flows=# \d+ dstexterne;
View "public.dstexterne"
Column | Type | Modifiers | Storage | Description
---------+------+-----------+-
--------+------------- dstaddr | ip4 | | plain |
View definition:
SELECT DISTINCT flowscompact.dstaddr
FROM flowscompact
LEFT JOIN mynetworks ON mynetworks.ipaddr >> flowscompact.dstaddr::ip4r
WHERE mynetworks.ipaddr IS NULL;
Thanks!
Charles
Charles,
Also, let’s try to simplify your query and see if it performs better.
You are grouping by srcaddr, dstaddr, dstport, that makes DISTINCT not needed.
And after simplifying WHERE clause (let me know if the result is not what you want), the query looks like:
SELECT srcaddr, dstaddr, dstport,
COUNT(*) AS conversation,
SUM(doctets) / 1024 / 1024 AS mbytes
FROM flowscompact
WHERE srcaddr IN (SELECT ipaddr FROM mynetworks)
AND dstaddr NOT IN (SELECT ipaddr FROM mynetworks)
GROUP BY srcaddr, dstaddr, dstport
ORDER BY mbytes DESC
LIMIT 50;
Now, you didn’t provide the definition of flowstimestamp table.
If this table doesn’t have an index on (srcaddr, dstaddr, dstport) creating one should help (I think).
Igor
http://charlesnadeau.blogspot.com/