Re: Very long execution time of "select nextval('..');"

mljv@xxxxxxxxxxxx · Mon, 28 Jan 2008 12:19:11 +0100

Hi Greg,

first fo all: thanks a lot. i think i understood most of your comments, but - 
of course - have now more questions than before :-)

Am Montag, 28. Januar 2008 01:07 schrieb Greg Smith:
> On Sun, 27 Jan 2008, mljv@xxxxxxxxxxxx wrote:
> > ok, at the moment i got some traffic and my load is at 1.5. But now with
> > logging the timestamp I have seen that the long durations are quite
> > regular at intervals of 10 minutes.
>
> Sure sounds like checkpoints.  You should turn on the checkpoint warning
> feature so it always triggers and see if the long queries completely just
> after the checkpoints finish.  

i read und tried to understand all about checkpoints.
i have set 
   checkpoint_warning=3600 
and restartet Postgresql but i dont see any checkpoint warnings inside my logs 
with 
   log_min_messages = info
   log_min_error_statement = warning

so i think a checkpoint warning should pop up frequently as i have 
  checkpoint_segments = 10
  checkpoint_timeout = 300
so at least each five minutes a warning should be shown, but it isn't. 

> Notes on that and what you can do to 
> possibly improve checkpoint behavior are at
> http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm The
> early parts of that mostly refer to 8.2 but 8.1 is basically the same in
> this area.

so this what i have done to tune checkpoints. Please let me know if it sounds 
reasonable. It list all parameters which in my opinion affect write 
performance and which dont have their default value

fsync = on
wal_buffers = 16
bgwriter_delay = 200                   
bgwriter_lru_percent = 10.0
bgwriter_lru_maxpages = 100
bgwriter_all_percent = 5
bgwriter_all_maxpages = 200

checkpoint_timeout = 300               
checkpoint_segments = 10
checkpoint_warning = 3600

shared_buffers = 30000
work_mem = 10240
maintenance_work_mem = 163840
max_fsm_pages = 500000
effective_cache_size = 180000 
random_page_cost = 3

autovacuum = on                 
autovacuum_naptime = 3000

stats_start_collector = on
stats_command_string = off
stats_block_level = on
stats_row_level = on
stats_reset_on_server_start = on

but after a restart i still got lots of queries above 500ms (my new value for 
log_min_duration) and still some of them are so simple as select nextval. 
(Logs are shortend for your convinience):

11:53:48 duration: 14473.594 ms  EXECUTE <unnamed>  [PREPARE:  select nextval
11:53:48 duration: 12527.846 ms  EXECUTE <unnamed>  [PREPARE:  select nextval 
11:53:48 duration: 12488.325 ms  EXECUTE <unnamed>  [PREPARE:  select nextval 
11:53:48 duration: 10300.745 ms  EXECUTE <unnamed>  [PREPARE:  select nextval 
11:53:48 duration: 4638.686 ms  EXECUTE <unnamed>  [PREPARE:  select nextval 
11:53:48 duration: 4012.015 ms  EXECUTE <unnamed>  [PREPARE:  select nextval 
11:53:48 duration: 2087.131 ms  EXECUTE <unnamed>  [PREPARE:  select nextval 
11:53:48 duration: 11669.099 ms  EXECUTE <unnamed>  [PREPARE:  select nextval 
11:54:14 duration: 2007.563 ms  EXECUTE <unnamed>  [PREPARE:  select nextval

> > but what in hell can make nextval take so long? even if checkpointing is
> > badly configured.
>
> You're in a situation where your amount of RAM far exceeds your disk I/O
> capabilities.  Brutally bad checkpoints are easy to encounter in that
> setup.  Linux by default will use 10% of RAM to hold writes.  At
> checkpoint time, that entire Linux buffer cache has to be cleared of
> database writes on top of what's written by the checkpoint itself.  How
> long do you think it takes to write >800MB of database data with a
> significant random-access component to it when your disk is a simple
> RAID-1?  20 seconds is not out of the question.

ok. But is it something you really recommend? I fear changing such values as i 
do not understand their impact.

> You may want to significantly reduce the size of the Linux write buffer
> and see if that helps.
> http://www.westnet.com/~gsmith/content/linux-pdflush.htm goes over theory
> and suggestions here.
>
> > I always thought that nextval is one of the fastest operations.
> > So if it takes 500 ms, fine. things like this can always happen, but 20
> > seconds sounds more like a hardware failure. But i can't see any.
>
> Just about everything gets blocked behind the worse checkpoint spikes.
> The thing that kind of bothers me about your case is that I'd expect other
> queries would also be blocked and you'd have a whole set of >250ms ones
> lined up just after the checkpoint is done.  That you're only reporting
> issues with nextval makes me wonder if there isn't some other locking
> driving the main behavior, perhaps something that just gets worse at
> checkpoint time rather than being directly caused by it.

i have many queries far above  250ms. I just choosed to report nextval 
statements only because other statements are rather complicated and i would 
just get an answer about "explain analyze" which is not my problem. 

I appreciate your help very much. thanks a lot.

kind regards,
janning

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match