Scott Marlowe wrote:
I doubt that as a serious amount of logging is taking place on the box all the time which goes straight to disk. Also, no disk in the world would take more than a minute to spin up...Gunther Mayer wrote:Hi there, We run a small ISP with a FreeBSD/freeradius/postgresql 8.2.4 backend and 200+ users. Authentication happens via UAM/hotspot and I see a lot of authorisation and accounting packets that are handled via PL/PGSQL functions directly in the database. Everything seems to work 100% except that a few times a day I see Jun 6 10:41:31 caligula postgres[57347]: [4-1] radiususer: LOG: duration: 19929.291 ms statement: SELECT fn_accounting_start(...) in my logs. I'm logging slow queries with log_min_duration_statement = 500 in my postgresql.conf. Sometimes another query runs equally slow or even slower (I've seen 139 seconds!!!) a few minutes before or after as well, but then everything is back to normal. Even though I haven't yet indexed my data I know that the system is performant because my largest table (the accounting one) only has 5000+ rows, the entire database is only a few MB's and I have plenty of memory (2GB), shared_buffers = 100MB and max_fsm_pages = 179200. Also from briefly enabling log_parser_stats = on log_planner_stats = on log_executor_stats = on I saw that most queries are 100% satisfied from cache so the disk doesn't even get hit. Finally, the problem seems unrelated to load because it happens at 4am just as likely as at peak traffic time. What the heck could cause such erratic behaviour? I suspect some type ofresource problem but what and how could I dig deeper?Maybe your hard drive is set to spin down after a certain period of idle, and since most all your data is coming from memory, then it might be that on the rare occasion when it needs to hit the drive it's not spun up anymore.
Maybe some other process is cranking up (cron jobs???) that are chewing up all your I/O bandwidth?Hmm, I investigated that too but if that was the case the queries would run slow always at the same time of the day.
Hard to say. Anything in the system logs that would give you a hint? Try correlating them by the time of the slow pgsql queries.Nothing relevant in the system logs at the time of the slow query appearing. I have in the mean time tweaked syslog-ng.conf such that as soon as it detects a "duration: <greater than 500>ms" log message it spawns top and top -m io and redirects the output to file. At least in that way I can check what's keeping the system busy immediately *after* a slow query has occured. Of course now Murphy's law has it that since I've done that (30 hours ago) not a single slow query has fired, but hey, I'll look at the results once I have them.
On another note, autovacuum couldn't cause such issues, could it? I do have autovacuum enabled (autovacuum=on as well as stats_start_collector=on, stats_block_level = on and stats_row_level = on), is there any possibility that autovacuum is not as resource friendly as advertised?
Gunther