FW: Queries becoming slow under heavy load

"Anne Rosset" <arosset@xxxxxxxxxx> · Wed, 26 Jan 2011 08:04:41 -0800

Sorry it seems like the postgres configuration didn't come thru the
first time.

name	|	setting
---------------------------------	+
--------------------------
add_missing_from	|	off
allow_system_table_mods	|	off
archive_command	|	(disabled)
archive_mode	|	off
archive_timeout	|	0
array_nulls	|	on
authentication_timeout	|	1min
autovacuum	|	on
autovacuum_analyze_scale_factor	|	0.1
autovacuum_analyze_threshold	|	250
autovacuum_freeze_max_age	|	200000000
autovacuum_max_workers	|	3
autovacuum_naptime	|	5min
autovacuum_vacuum_cost_delay	|	20ms
autovacuum_vacuum_cost_limit	|	-1
autovacuum_vacuum_scale_factor	|	0.2
autovacuum_vacuum_threshold	|	500
backslash_quote	|	safe_encoding
bgwriter_delay	|	200ms
bgwriter_lru_maxpages	|	100
bgwriter_lru_multiplier	|	2
block_size	|	8192
bonjour_name	|	
check_function_bodies	|	on
checkpoint_completion_target	|	0.5
checkpoint_segments	|	3
checkpoint_timeout	|	5min
checkpoint_warning	|	30s
client_encoding	|	UTF8
client_min_messages	|	notice
commit_delay	|	250
commit_siblings	|	10
constraint_exclusion	|	off
cpu_index_tuple_cost	|	0.005
cpu_operator_cost	|	0.0025
cpu_tuple_cost	|	0.01
custom_variable_classes	|	
DateStyle	|	ISO, MDY
db_user_namespace	|	off
deadlock_timeout	|	1s
debug_assertions	|	off
debug_pretty_print	|	off
debug_print_parse	|	off
debug_print_plan	|	off
debug_print_rewritten	|	off
default_statistics_target	|	10
default_tablespace	|	
default_text_search_config	|	pg_catalog.simple
default_transaction_isolation	|	read committed
default_transaction_read_only	|	off
default_with_oids	|	off
effective_cache_size	|	4000000kB
enable_bitmapscan	|	on
enable_hashagg	|	on
enable_hashjoin	|	on
enable_indexscan	|	on
enable_mergejoin	|	off
enable_nestloop	|	on
enable_seqscan	|	on
enable_sort	|	on
enable_tidscan	|	on
escape_string_warning	|	on
explain_pretty_print	|	on
extra_float_digits	|	0
from_collapse_limit	|	8
fsync	|	on
full_page_writes	|	on
geqo	|	off
geqo_effort	|	5
geqo_generations	|	0
geqo_pool_size	|	0
geqo_selection_bias	|	2
geqo_threshold	|	12
gin_fuzzy_search_limit	|	0
ignore_system_indexes	|	off
integer_datetimes	|	off
join_collapse_limit	|	8
krb_caseins_users	|	off
krb_server_hostname	|	
krb_srvname	|	postgres
lc_collate	|	en_US.UTF-8
lc_ctype	|	en_US.UTF-8
lc_messages	|	en_US.UTF-8
lc_monetary	|	en_US.UTF-8
lc_numeric	|	en_US.UTF-8
lc_time	|	en_US.UTF-8
listen_addresses	|	127.0.0.1,208.75.198.149
local_preload_libraries	|	
log_autovacuum_min_duration	|	-1
log_checkpoints	|	off
log_connections	|	off
log_destination	|	stderr
log_disconnections	|	off
log_duration	|	off
log_error_verbosity	|	default
log_executor_stats	|	off
log_hostname	|	off
log_line_prefix	|	
log_lock_waits	|	off
log_min_duration_statement	|	-1
log_min_error_statement	|	error
log_min_messages	|	notice
log_parser_stats	|	off
log_planner_stats	|	off
log_rotation_age	|	0
log_rotation_size	|	0
log_statement	|	none
log_statement_stats	|	off
log_temp_files	|	-1
log_timezone	|	Asia/Kolkata
log_truncate_on_rotation	|	off
logging_collector	|	on
maintenance_work_mem	|	256MB
max_connections	|	100
max_files_per_process	|	1000
max_fsm_pages	|	500000
max_fsm_relations	|	500
max_function_args	|	100
max_identifier_length	|	63
max_index_keys	|	32
max_locks_per_transaction	|	64
max_prepared_transactions	|	5
max_stack_depth	|	5MB
password_encryption	|	on
port	|	5432
post_auth_delay	|	0
pre_auth_delay	|	0
random_page_cost	|	4
regex_flavor	|	advanced
search_path	|	"$user",public
seq_page_cost	|	1
server_encoding	|	UTF8
server_version	|	8.3.8
server_version_num	|	80308
session_replication_role	|	origin
shared_buffers	|	240MB
silent_mode	|	off
sql_inheritance	|	on
ssl	|	off
standard_conforming_strings	|	off
statement_timeout	|	0
superuser_reserved_connections	|	3
synchronize_seqscans	|	on
synchronous_commit	|	on
syslog_facility	|	LOCAL0
syslog_ident	|	postgres
tcp_keepalives_count	|	9
tcp_keepalives_idle	|	7200
tcp_keepalives_interval	|	75
temp_buffers	|	1024
temp_tablespaces	|	
TimeZone	|	Asia/Kolkata
timezone_abbreviations	|	Default
trace_notify	|	off
trace_sort	|	off
track_activities	|	on
track_counts	|	on
transaction_isolation	|	read committed
transaction_read_only	|	off
transform_null_equals	|	off
unix_socket_group	|	
unix_socket_permissions	|	511
update_process_title	|	on
vacuum_cost_delay	|	50ms
vacuum_cost_limit	|	200
vacuum_cost_page_dirty	|	20
vacuum_cost_page_hit	|	1
vacuum_cost_page_miss	|	10
vacuum_freeze_min_age	|	100000000
wal_buffers	|	10MB
wal_sync_method	|	fdatasync
wal_writer_delay	|	200ms
work_mem	|	64MB
xmlbinary	|	base64
xmloption	|	content
zero_damaged_pages	|	off
(176 rows)	

Today we did more analysis and observed  postgress processes that
continually reported status 'D' in top. The corresponding vmstat showed
a proportionate amount of processes under the 'b' column,
"uninterruptible" state.

We've been able to match  long running database queries to such
processes. This occurs under relatively low load average (say 4 out of
8) and can involve as little as 1 single sql query.
It seems that many queries get into that state and that is causing our
load average to spike very high.
Queries are finishing even though we continue to see an increase in
postgres processes in 'D' state.
Are we facing some serious db locking?  What could lead to this?
(The box has 8G and 8 cores)

Thanks for any help,
Anne

-----Original Message-----
From: Andy Colson [mailto:andy@xxxxxxxxxxxxxxx]
Sent: Tuesday, January 25, 2011 2:13 PM
To: Anne Rosset
Cc: pgsql-performance@xxxxxxxxxxxxxx
Subject: Re:  Queries becoming slow under heavy load

On 1/25/2011 3:37 PM, Anne Rosset wrote:
> Hi,
>
> We are running some performances tests. With a lot of concurrent 
> access, queries get very slow. When there is no load, those queries
run fast.
>
> We kind of see a trend about these queries: it seems like the ones 
> that become very slow have an ORDER BY or MAX in them.
>
> Here are our config settings:
>

<SNIP>

> It seems to me that we should try increasing shared_buffers. But do 
> you have any other suggestions? Or do you see anything wrong in our
config?
>
> Thanks,
>
> Anne
>

While I applaud your attempt to get us lots of information,
unfortunately the the one property you ask about (shared_buffers), I
can't seem to find.

So, maybe you could post a bit more:

1) how many concurrent clients?
2) can we see an explain analyze for a query when its fast, and then
again when its slow?
3) Is this box dedicated to PG or are there other services running?
4) Looks like you have 8 Gig of ram, so I assume this is a 64 bit OS,
can you tell us what you have for:

shared_buffers
effective_cahce_size
work_mem

5) Once many clients start hitting the db, it might not all fit into ram
and start hitting the HD, can you tell us what sort of IO you have
(sata, scsi, raid, # of disks, etc).

The stats from /proc/meminfo:
SwapTotal:     2097112 kB
SwapFree:      2096612 kB

Was this run when the system was busy?  Looks like you are not using any
swap, so thats good at least.  Oh, wait, there are two cat
/proc/meminfo's.  Is one when its fast and one when its slow?

Looks to me, in both cases, you are not using much memory at all.  (if
you happen to have 'free', its output is a little more readable, if you
wouldn't mind posting it (only really need it for when the box is slow)

-Andy

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance