Hi Greg, first fo all: thanks a lot. i think i understood most of your comments, but - of course - have now more questions than before :-) Am Montag, 28. Januar 2008 01:07 schrieb Greg Smith: > On Sun, 27 Jan 2008, mljv@xxxxxxxxxxxx wrote: > > ok, at the moment i got some traffic and my load is at 1.5. But now with > > logging the timestamp I have seen that the long durations are quite > > regular at intervals of 10 minutes. > > Sure sounds like checkpoints. You should turn on the checkpoint warning > feature so it always triggers and see if the long queries completely just > after the checkpoints finish. i read und tried to understand all about checkpoints. i have set checkpoint_warning=3600 and restartet Postgresql but i dont see any checkpoint warnings inside my logs with log_min_messages = info log_min_error_statement = warning so i think a checkpoint warning should pop up frequently as i have checkpoint_segments = 10 checkpoint_timeout = 300 so at least each five minutes a warning should be shown, but it isn't. > Notes on that and what you can do to > possibly improve checkpoint behavior are at > http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm The > early parts of that mostly refer to 8.2 but 8.1 is basically the same in > this area. so this what i have done to tune checkpoints. Please let me know if it sounds reasonable. It list all parameters which in my opinion affect write performance and which dont have their default value fsync = on wal_buffers = 16 bgwriter_delay = 200 bgwriter_lru_percent = 10.0 bgwriter_lru_maxpages = 100 bgwriter_all_percent = 5 bgwriter_all_maxpages = 200 checkpoint_timeout = 300 checkpoint_segments = 10 checkpoint_warning = 3600 shared_buffers = 30000 work_mem = 10240 maintenance_work_mem = 163840 max_fsm_pages = 500000 effective_cache_size = 180000 random_page_cost = 3 autovacuum = on autovacuum_naptime = 3000 stats_start_collector = on stats_command_string = off stats_block_level = on stats_row_level = on stats_reset_on_server_start = on but after a restart i still got lots of queries above 500ms (my new value for log_min_duration) and still some of them are so simple as select nextval. (Logs are shortend for your convinience): 11:53:48 duration: 14473.594 ms EXECUTE <unnamed> [PREPARE: select nextval 11:53:48 duration: 12527.846 ms EXECUTE <unnamed> [PREPARE: select nextval 11:53:48 duration: 12488.325 ms EXECUTE <unnamed> [PREPARE: select nextval 11:53:48 duration: 10300.745 ms EXECUTE <unnamed> [PREPARE: select nextval 11:53:48 duration: 4638.686 ms EXECUTE <unnamed> [PREPARE: select nextval 11:53:48 duration: 4012.015 ms EXECUTE <unnamed> [PREPARE: select nextval 11:53:48 duration: 2087.131 ms EXECUTE <unnamed> [PREPARE: select nextval 11:53:48 duration: 11669.099 ms EXECUTE <unnamed> [PREPARE: select nextval 11:54:14 duration: 2007.563 ms EXECUTE <unnamed> [PREPARE: select nextval > > but what in hell can make nextval take so long? even if checkpointing is > > badly configured. > > You're in a situation where your amount of RAM far exceeds your disk I/O > capabilities. Brutally bad checkpoints are easy to encounter in that > setup. Linux by default will use 10% of RAM to hold writes. At > checkpoint time, that entire Linux buffer cache has to be cleared of > database writes on top of what's written by the checkpoint itself. How > long do you think it takes to write >800MB of database data with a > significant random-access component to it when your disk is a simple > RAID-1? 20 seconds is not out of the question. ok. But is it something you really recommend? I fear changing such values as i do not understand their impact. > You may want to significantly reduce the size of the Linux write buffer > and see if that helps. > http://www.westnet.com/~gsmith/content/linux-pdflush.htm goes over theory > and suggestions here. > > > I always thought that nextval is one of the fastest operations. > > So if it takes 500 ms, fine. things like this can always happen, but 20 > > seconds sounds more like a hardware failure. But i can't see any. > > Just about everything gets blocked behind the worse checkpoint spikes. > The thing that kind of bothers me about your case is that I'd expect other > queries would also be blocked and you'd have a whole set of >250ms ones > lined up just after the checkpoint is done. That you're only reporting > issues with nextval makes me wonder if there isn't some other locking > driving the main behavior, perhaps something that just gets worse at > checkpoint time rather than being directly caused by it. i have many queries far above 250ms. I just choosed to report nextval statements only because other statements are rather complicated and i would just get an answer about "explain analyze" which is not my problem. I appreciate your help very much. thanks a lot. kind regards, janning ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match