> Roberts, Jon wrote: > >> Tom Lane wrote: > >>> "Roberts, Jon" <Jon.Roberts@xxxxxxxxxxx> writes: > >>>> Version: "PostgreSQL 8.3.0, compiled by Visual C++ build 1400" > >>> Well, there are plenty of known bugs in 8.3.0 by now. You really > >>> should update before complaining, not after. > >> Yes. And the traditional question should be asked - is there any > >> antivirus or other "personal security" software running on tihs > > machine? > >> If so, uninstall (not just disable!) it and see if the problem goes > > away. > > > > I am not able to un-install this. However, this problem only started as > > the database grew in size and usage. It is 232 MB in size now and it > > has 30 or so active sessions 24x7. > > But you do run antivirus on the machine? Which antivirus? We are running Computer Associates eTrust Antivirus. The realtime scanner has an exclusion of e:\PostgreSQL and all subdirectories. > > It's quite possible that this is the reason. As the files grow, the AV > may take longer to do whatever crap it's doing to them, thus exceeding > timeouts. > The log files are located here: e:\PostgreSQL\data\pg_log Before I had that exclusion on the directory, the database would crash because data files would get locked. We don't get those errors now at all so I don't think this problem is caused by the antivirus program. We do have lots of sessions running at once. It seems that the extra load on the server could also cause it to exceed the timeout. > In general, you can get all sorts of strange things when you run AV on > your database server. The general recommendation is never to do that if > you want things to work, and it will be the first thing you're told to > remove. > I agree to an extent. I've used Oracle and SQL Server on Windows and when the antivirus program scans any database files, it screws it up. However, I've never had a problem with an antivirus program once the database directories were excluded. > It would be good if you could at least temporarily remove it and see if > it fixes the issue. > I can't do that. In this corporate environment, I would get shot if I did that. > > >> It would also be interesting to know if there are actually any other > >> processes running at this time. > >> > > > > Yes, there were about 30 active sessions executing functions. > > Right, but are the processes still active at the moment whrere it > crashes, or has the postmaster managed to kill them off for the restart? > Yes, there are active sessions. Could this be the problem? We have a queue table in which we only allow x number of concurrent jobs. A job is first inserted into the Queue with a Status of "Queued". Next the job executes a function to update the Status to "Processing". Function Details: 1. Creates a db_link 2. "lock table gp_load.queue in access exclusive mode" 3. executes this in the linked session: select count(*) from gp_load.queue where status = 'Processing' 4. if current jobs less than max jobs allowed then Update gp_load.queue set status = 'Processing' Commit 5. Close connection 6. Returns the status of still Queued or Processing I have to use a dblink because PostgreSQL doesn't have autonomous transactions. We call a single function to do all of this work and the other jobs need to see the current Status so they know if they should wait or run right now. The calling function uses a while loop to execute this function. If the function returns Queued, it means it was unable to change the status so it uses pg_sleep to wait 10 seconds and then it trys again until it is able to set the status to Processing. Sorry for the lengthy answer but yes, we could have active sessions using pg_sleep waiting for a slot in the Queue to process a job. The log file shows that this happened too. Jon