El día miércoles, septiembre 14, 2022 a las 07:19:31a. m. -0700, Adrian Klaver escribió: > On 9/14/22 01:31, Matthias Apitz wrote: > > > > We have a C-written application server which uses ESQL/C on top > > of PostgreSQL 13.1 on Linux. The application in question always serves > > the same search in a librarian database, given to the server > > as commands over the network, login into the application and doing > > a search: > > > > SLNPServerInit > > User:zfl > > SLNPEndCommand > > > > SLNPSearch > > HitListName:Zfernleihe > > Search:1000=472214284 > > SLNPEndCommand > > > > To fulfill the search, the application server has to do some 100 > > ESQL/C calls and all this should not take longer than 1-2 seconds, and > > normally it does not take longer. But, in some situations it takes > > longer than 180 seconds, in 10% of the cases. The other 90% are below 2 seconds, > > i.e. this is digital: Or 2 seconds, or more than 180 seconds, no values between. > > > > We can easily simulate the above with a small shell script just sending over > > the above two commands with 'netcat' and throwing away its result (the real search is > > done by an inter library loan software which has an timeout of 180 seconds > > to wait for the SLNPSearch search result -- that's why we got to know > > about the problem at all, because all this is running automagically with > > no user dialogs). The idea of the simulated search was to get to know > > with the ESQL/C log files which operation takes so long and why. > > Does the test search run the inter library loan software? The real picture is: ILL-software --(network, search command)---> app-server --(ESQL/C)--> PostgreSQL-server test search --(localhost, search command)-> app-server --(ESQL/C)--> PostgreSQL-server > > Well, since some day, primary to catch the situation, we send over every > > 10 seconds this simulated searches and since then the problem went away at all. > > To be clear the problem went away for the real search? Yes, since the 'test search' runs every 10 seconds, the above pictured 'ILL-software', doing the same search, does not face the problem anymore. > > Where is the inter library software, in your application or are you reaching > out to another application? The above 'app-server' fulfills the search requested by the 'ILL-software' (or the 'test search'), i.e. looks up for one single librarian record (one row in the PostgreSQL database) and delivers it to the 'ILL-software'. The request from the 'ILL-software' is not a heavy duty, more or less 50 requests per day. > Is the search running across a remote network? The real search comes over the network through a stunnel. But we watched with tcpdump the incoming search and the response by the 'app-server' locally. In the case of the timeout, the 'app-server' does not answer within 180 seconds, i.e. does not send anything into the stunnel, and the remote 'ILL-software' terminates the connection with an F-packet. I will now: - shutdown the test search every 10 secs to see if the problem re-appears - set 'log_autovacuum_min_duration = 0' in postgresql.conf to see if the times of the problem matches; Thanks for your feedback in any case. matthias -- Matthias Apitz, ✉ guru@xxxxxxxxxxx, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub