Hello list, I'm having a problem with a production system.
I have an application that runs 7x24. It is multithreaded, and every thread has its own connection to the database.
We have two types of devices connected to each thread, human operated PCs for billing purposes, and rabbit microcontrollers controlling some electronic devices.
The problem we are having is that once a month approximately, all the microcontrollers attached to their threads (its a common pool of threads), stop responding. Having a closer look at the logs, we found that these threads are stalled at a SELECT NOW() immediately after a BEGIN TRANSACTION. And then the thread stops responding, since it is waiting for that sql command to finish.
The curious part is that all PCs still work ok, even though they also execute SELECT NOW().
This log represents the session for one PC as an example.
02/09/2010 09:26:32.593 | 192.168.203.104:2255 | Thread 17: (CAJA 4@Parking B->Exterior)Thread 17: [TRANSACTION BEGIN]
02/09/2010 09:26:32.593 | (17) | ROLLBACK
02/09/2010 09:26:32.593 | (17) | BEGIN TRANSACTION
02/09/2010 09:26:32.593 | (17) | SELECT NOW()
02/09/2010 09:26:32.593 | 192.168.203.104:2255 | Thread 17: (CAJA 4@Parking B->Exterior)0 ms (errorCode=[200] bytesRead=[17] bytesWritten=[3])
This log represents the session for one thread attached to a device.
02/09/2010 09:15:01.265 | 192.168.203.52:1127 | Thread 8: (SAL2ABJ TCA@Parking B->Exterior)Thread 8: [EVENT AUTH TR 005001040719 02/09/2010 09:15:01 1]
02/09/2010 09:15:01.265 | (8) | BEGIN TRANSACTION
02/09/2010 09:15:01.265 | (8) | SELECT NOW()
and the thread never continues, it never comes by the select now().
The only difference from the database point of view, is that PCs transactions, since they are much more complex, usually start with a ROLLBACK (arguable, it will be eventually removed) and for devices they usually just start with a COMMIT.
Do you have any idea on what could be causing a SELECT NOW() to never come back?
I'm dazzled.
The problem gets solved restarting the server application (and by doing so, all connections to the database are dropped, and after that it doesnt present the problem anymore for at least a month). It has happened approximately 4 or 5 times already.
Thank you,
Eduardo.