Re: Hash join on int takes 8..114 seconds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Scott,

thank you.

> work_mem = 512
This is very easy to try.  You can change work_mem for just a single
session, and this can in some cases help performance quite a bit, and in
others not at all.
I would not recommend having it lower than at least 4MB on a server like
that unless you have a lot of concurrently active queries / connections.
To try it, simply use the SET command.  To try out 32MB, just do:
SET work_mem = '32MB';

8.1.4 requires int value.
SET work_mem = 33554432;
causes:
ERROR: 33554432 is outside the valid range for parameter "work_mem" (64 ..
2097151)

So max allowed value seems to be 2 MB
I tested it when this server was idle by running both queries several times
after VACUUM FULL was running

SET work_mem = 2097151;


No, not really. The work_mem value is specified in kilobytes, so you've set it to 2GB :-)

set search_path to firma2,public;
explain analyze
SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'
"Aggregate  (cost=177291.36..177291.37 rows=1 width=0) (actual
time=5153.856..5153.859 rows=1 loops=1)"
"  ->  Nested Loop  (cost=73607.45..177229.96 rows=24561 width=0) (actual
time=1395.935..5071.247 rows=21541 loops=1)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1
width=24) (actual time=0.078..0.087 rows=1 loops=1)"
"              Index Cond: ('X05'::bpchar = toode)"
"        ->  Hash Join  (cost=73607.45..176978.33 rows=24561 width=24)
(actual time=1395.836..4911.425 rows=21541 loops=1)"
"              Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"              ->  Bitmap Heap Scan on rid  (cost=4083.10..101802.05
rows=270316 width=28) (actual time=238.578..2367.189 rows=278247 loops=1)"
"                    Recheck Cond: (toode = 'X05'::bpchar)"
"                    ->  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4083.10 rows=270316 width=0) (actual time=150.868..150.868
rows=278248 loops=1)"
"                          Index Cond: (toode = 'X05'::bpchar)"
"              ->  Hash  (cost=69242.72..69242.72 rows=112651 width=4)
(actual time=1146.081..1146.081 rows=110170 loops=1)"
"                    ->  Bitmap Heap Scan on dok  (cost=1492.28..69242.72
rows=112651 width=4) (actual time=46.210..696.803 rows=110170 loops=1)"
"                          Recheck Cond: (kuupaev >= '2008-09-01'::date)"
"                          ->  Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1492.28 rows=112651 width=0) (actual time=33.938..33.938
rows=110232 loops=1)"
"                                Index Cond: (kuupaev >=
'2008-09-01'::date)"
"Total runtime: 5154.911 ms"


SET work_mem to default;
set search_path to firma2,public;
explain analyze
SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'
"Aggregate  (cost=181869.36..181869.37 rows=1 width=0) (actual
time=7807.867..7807.871 rows=1 loops=1)"
"  ->  Nested Loop  (cost=74048.45..181807.96 rows=24561 width=0) (actual
time=2607.429..7728.138 rows=21541 loops=1)"
"        ->  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1
width=24) (actual time=0.079..0.091 rows=1 loops=1)"
"              Index Cond: ('X05'::bpchar = toode)"
"        ->  Hash Join  (cost=74048.45..181556.33 rows=24561 width=24)
(actual time=2607.332..7569.612 rows=21541 loops=1)"
"              Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"              ->  Bitmap Heap Scan on rid  (cost=4083.10..101802.05
rows=270316 width=28) (actual time=1147.071..4528.845 rows=278247 loops=1)"
"                    Recheck Cond: (toode = 'X05'::bpchar)"
"                    ->  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4083.10 rows=270316 width=0) (actual time=1140.337..1140.337
rows=278248 loops=1)"
"                          Index Cond: (toode = 'X05'::bpchar)"
"              ->  Hash  (cost=69242.72..69242.72 rows=112651 width=4)
(actual time=1240.988..1240.988 rows=110170 loops=1)"
"                    ->  Bitmap Heap Scan on dok  (cost=1492.28..69242.72
rows=112651 width=4) (actual time=68.053..769.448 rows=110170 loops=1)"
"                          Recheck Cond: (kuupaev >= '2008-09-01'::date)"
"                          ->  Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1492.28 rows=112651 width=0) (actual time=61.358..61.358
rows=110232 loops=1)"
"                                Index Cond: (kuupaev >=
'2008-09-01'::date)"
"Total runtime: 7808.174 ms"

In both cases vmstat 2 shows only cpu activity when queries are running:

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
wa
...
0  0    232 123692      0 1891044    0    0     0     0  252    17  0  0
100  0
0  0    232 123692      0 1891044    0    0     0     0  252    17  0  0
100  0
0  0    232 123684      0 1891044    0    0     0   162  254    22  0  0
100  0
0  0    232 123684      0 1891044    0    0     0     0  252    18  0  0
100  0
1  0    232 123056      0 1891444    0    0     0    13  254    21 62  5 34
0   <---- start of slower query
1  0    232 102968      0 1911060    0    0     0    16  252    18 26 75  0
0
1  0    232  77424      0 1936996    0    0     0     0  252    18 37 63  0
0
1  0    232  71464      0 1941928    0    0     0    73  260    34 38 62  0
0
0  0    232 123420      0 1891044    0    0     0    32  257    31  8 15 77
0 <-------- end of slower query
0  0    232 123420      0 1891044    0    0     0    25  255    24  0  0
100  0
0  0    232 123420      0 1891044    0    0     0    28  255    27  0  0
100  0


Well, this parameter specifies how much memory may be used for in-memory sorting and hash tables. If more memory is needed, the sort / hash table will be performed on-disk.

I guess the difference in speed is not huge, so in this case the hash table (used for join) probably fits into the 1024kB (which is a default value).

Is it safe to set

work_mem = 2097151

in postgresql.conf file ?

Each sort in a query uses a separate area in a memory (up to work_mem). So if you have 10 sessions running a query with 2 sorts, that may occupy

   10 * 2 * work_mem

of memory. Let's suppose you set a reasonable value (say 8096) instead of 2GB. That gives about 160MB.

Anyway this depends - if you have a lot of slow queries caused by on-disk sorts / hash tables, use a higher value. Otherwise leave it as it is.

First, demonstrate that it is all or mostly in memory -- use iostat or
other tools to ensure that there is not much disk activity during the
query.  If your system doesn't have iostat installed, it should be
installed.  It is a very useful tool.

# iostat
bash: iostat: command not found
# locate iostat
/usr/src/linux-2.6.16-gentoo-r9/Documentation/iostats.txt

I have few experience in Linux. No idea how to install or compile iostat in
this system.

If it is all cached in memory, you may want to ensure that your
shared_buffers is a reasonalbe size so that there is less shuffling of data from the kernel to postgres and back. Generally, shared_buffers works best
between 5% and 25% of system memory.

currently shared_buffers = 15000

That's 120MB, i.e. about 6% of the memory. Might be a little bit higher, but seems reasonable.

If it is completely CPU bound then the work done for the query has to be
reduced by altering the plan to a more optimal one or making the work it
has to do at each step easier.  Most of the ideas in this thread revolve
around those things.

When running on loaded server even after VACUUM FULL, response time for
original work_mem is longer probably because it must fetch blocks from disk.

Based on the time it took to do the vacuum, I suspect your disk subsystem
is a bit slow. If it can be determined that there is much disk I/O in your
use cases, there are generally several things that can be done to tune
Linux I/O.  The main ones in my experience are the 'readahead' value for
each disk which helps sequential reads significantly, and trying out the
linux 'deadline' scheduler and comparing it to the more commonly used 'cfq'
scheduler.  If the system is configured with the anticipatory scheduler,
absolutely switch to cfq or deadline as the anticipatory scheduler will
perform horribly poorly for a database.

This is 3 year old cheap server.
No idea what to config.

Well, how could we know that if you don't?

Anyway the options mentioned by Scott Carey are related to linux kernel, so it shouldn't be a problem to change.

There is also other similar server which as 1.2 GB more usable memory. No idea is it worth to switch into it.
After some years sales data will still exceed this more memory.

Given the fact that the performance issues are caused by bloated tables and / or slow I/O subsystem, moving to a similar system won't help I guess.

regards
Tomas

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux