Search Postgresql Archives

Re: Active sessions does not terminated due to statement_timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After terminating session file has been updated with new lines.
 
***content from the previous letter***
 
***That one unfinished line. Freeze for ~6 hours.***
epoll_wait(4, 0x555d738e4f30, 1, -1)    = -1 EINTR (Interrupted system call) <21328.371590>
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2340453, si_uid=2001} ---
kill(2292604, SIGURG)                   = 0 <0.000056>
rt_sigreturn({mask=[URG]})              = -1 EINTR (Interrupted system call) <0.000027>
write(2, "\0\0;\1|\373\"\0t2024-03-27 18:43:56 MSK"..., 324) = 324 <0.000027>
rt_sigprocmask(SIG_SETMASK, [URG], NULL, 8) = 0 <0.000018>
close(64)                               = 0 <0.000041>
close(218)                              = 0 <0.000029>
close(220)                              = 0 <0.000027>
close(219)                              = 0 <0.000023>
unlink("base/121442067/t4_182045966")   = 0 <0.000520>
unlink("base/121442067/t4_182045966.1") = -1 ENOENT (No such file or directory) <0.000087>
unlink("base/121442067/t4_182045966_fsm") = -1 ENOENT (No such file or directory) <0.000060>
unlink("base/121442067/t4_182045966_vm") = -1 ENOENT (No such file or directory) <0.000053>
unlink("base/121442067/t4_182045966_init") = -1 ENOENT (No such file or directory) <0.000061>
unlink("base/121442067/t4_182045963")   = 0 <0.000454>
 
***a lot of unlink***
 
unlink("base/121442067/t4_182045892")   = 0 <0.000046>
unlink("base/121442067/t4_182045892.1") = -1 ENOENT (No such file or directory) <0.000027>
unlink("base/121442067/t4_182045892_fsm") = -1 ENOENT (No such file or directory) <0.000023>
unlink("base/121442067/t4_182045892_vm") = -1 ENOENT (No such file or directory) <0.000021>
unlink("base/121442067/t4_182045892_init") = -1 ENOENT (No such file or directory) <0.000027>
close(407)                              = 0 <0.000032>
stat("base/pgsql_tmp/pgsql_tmp2292604.0", {st_mode=S_IFREG|0600, st_size=19141117, ...}) = 0 <0.000024>
unlink("base/pgsql_tmp/pgsql_tmp2292604.0") = 0 <0.010861>
sendto(9, "\24\0\0\0\30\0\0\0\23\17=\7]U\0\0\375\21$\1\0\0\0\0", 24, 0, NULL, 0) = 24 <0.000067>
write(2, "\0\0\265\0|\373\"\0t2024-03-27 18:43:56 MSK"..., 190) = 190 <0.000030>
brk(0x555d798fa000)                     = 0x555d798fa000 <0.002482>
brk(0x555d78902000)                     = 0x555d78902000 <0.003686>
brk(0x555d76f75000)                     = 0x555d76f75000 <0.008538>
openat(AT_FDCWD, "global/pg_filenode.map", O_RDONLY) = 64 <0.000043>
read(64, "\27'Y\0+\0\0\0\356\4\0\0r\316Y\10\224\v\0\0y\316Y\10\275\4\0\0\177\316Y\10"..., 512) = 512 <0.000022>
close(64)                               = 0 <0.000019>
openat(AT_FDCWD, "base/121442067/pg_filenode.map", O_RDONLY) = 64 <0.000027>
read(64, "\27'Y\0\21\0\0\0\353\4\0\0\3\26?\10\341\4\0\0\334\32?\10\347\4\0\0\347\4\0\0"..., 512) = 512 <0.000018>
close(64)                               = 0 <0.000018>
close(20)                               = 0 <0.000019>
 
***a lot of close***
 
close(138)                              = 0 <0.000019>
openat(AT_FDCWD, "base/121442067/138352131", O_RDWR) = 5 <0.000052>
lseek(5, 0, SEEK_END)                   = 149766144 <0.000018>
openat(AT_FDCWD, "base/121442067/2601", O_RDWR) = 6 <0.000029>
lseek(6, 0, SEEK_END)                   = 8192 <0.000017>
kill(21, SIGURG)                        = 0 <0.000035>
sendto(9, "\31\0\0\0\20\0\0\0\23\17=\7\4\0\0\0", 16, 0, NULL, 0) = 16 <0.000078>
sendto(9, "\2\0\0\0\300\3\0\0\23\17=\7\10\0\0\0\3\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 960, 0, NULL, 0) = 960 <0.000036>
sendto(9, "\2\0\0\0\300\3\0\0\23\17=\7\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 960, 0, NULL, 0) = 960 <0.000041>
 
***a lot of sendto***
 
sendto(9, "\20\0\0\0H\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\r\0\0\0\0\0\0\0"..., 72, 0, NULL, 0) = 72 <0.000030>
write(2, "\0\0\322\0|\373\"\0t2024-03-27 18:43:56 MSK"..., 219) = 219 <0.000050>
futex(0x7fd5aacbf6c4, FUTEX_WAKE_PRIVATE, 2147483647) = 0 <0.000020>
exit_group(1)                           = ?
+++ exited with 1 +++
 
PS: I guess "No such file or directory" messages due to execution of strace on host, and postgres process is executing inside docker.

Среда, 27 марта 2024, 18:39 +03:00 от Ц <pfunk@xxxxxxx>:
 
I was able to connect to this process using
 
strace -T -p `pgrep -n -u ks-postgres -f "ocb.*FETCH"` -o strace_of_active_session -ff
(the -T flag gives us the time the system call has been running at the end of each line).
 
If the session had been hanging active for some time, there was only one unfinished line in strace:
epoll_wait(4,
 
 
But if we got it a bit earlier, we got this output:
 
***something before process was catched***
 
epoll_wait(4, [{EPOLLOUT, {u32=1938706152, u64=93859859025640}}], 1, -1) = 1 <0.003515>
sendto(10, "1.211 \320\224\320\232Z200.21121002 \320\240\320\232211)"..., 1968, 0, NULL, 0) = 1968 <0.000068>
sendto(10, " \320\277\320\276\320\273 \321\204\320\265\320\262\321\200\320\260\320\273\321\214 \320\235\320\224\320\241 \320\275"..., 8192, 0, NULL, 0) = 8192 <0.000051>
pread64(407, "y\221\275\0\0\0\0\0\0\342\275[\0\325\320\244\320\232\320\243 \320\220\320\264\320\274\320\270\320\275\320"..., 8192, 10354688) = 8192 <0.000065>
sendto(10, "\377\377\0\0\0\270(907.0702.1340380180.111.2"..., 8192, 0, NULL, 0) = 8192 <0.000066>
pread64(407, "00057476 \320\276\321\202 31.01.2024 \320\262 \321\202.\321"..., 8192, 10362880) = 8192 <0.000044>
sendto(10, "7233430 \320\225\320\264\320\270\320\275\321\213\320\271 \320\275\320\260\320\273\320\276\320\263\320"..., 8192, 0, NULL, 0) = 8192 <0.000073>
 
***a lot of pread64, sendto***
 
sendto(10, "010006300\377\377\377\377\0\0\0X\320\241\320\274\320\276\320\273\320\265\320\275\321\201\320"..., 8192, 0, NULL, 0) = 8192 <0.000023>
pread64(407, "901001\10\2\0\0\30\0\3\0 y\221\235\0\0\0\0\0\0\34\202[\0\20\2\0\0"..., 8192, 16949248) = 8192 <0.000020>
sendto(10, "\321\200\320\265\320\265\321\201\321\202\321\200\321\203 \342\204\226 24119 \320\276\321\202 19"..., 8192, 0, NULL, 0) = 8192 <0.000024>
sendto(10, ") \320\273/\321\20120907234120  \320\221\320\23690741220"..., 8192, 0, NULL, 0) = 8192 <0.000022>
pread64(407, "\320\275\321\201\320\272\320\260 (\320\243\320\277\321\200\320\260\320\262\320\273\320\265\320\275\320\270\320\265 \320"..., 8192, 16957440) = 8192 <0.000019>
sendto(10, "632\0\0\0\n7707083893\0\0\0\0\0\0\0\t6730020"..., 8192, 0, NULL, 0) = 8192 <0.000022>
pread64(407, "\1\0\0\30\0\3\0 y\221\275\0\0\0\0\0\0\356\367[\0\375\320\244\320\232\320\243 \320\220\320"..., 8192, 16965632) = 8192 <0.000020>
sendto(10, "\0\0\0\02403234643667010006300\377\377\377\377\0\0\0d"..., 8192, 0, NULL, 0) = 6632 <0.000026>
sendto(10, "\320\260\321\206\320\270\320\270 \320\263\320\276\321\200\320\276\320\264\320\260 \320\241\320\274\320\276\320\273\320\265"..., 1560, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000019>
epoll_wait(4, 0x555d738e4f30, 1, -1)    = -1 EINTR (Interrupted system call) <2.855325>
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=2292607, si_uid=2001} ---
kill(2292604, SIGURG)                   = 0 <0.000023>
rt_sigreturn({mask=[URG]})              = -1 EINTR (Interrupted system call) <0.000052>
sendto(10, "\320\260\321\206\320\270\320\270 \320\263\320\276\321\200\320\276\320\264\320\260 \320\241\320\274\320\276\320\273\320\265"..., 1560, 0, NULL, 0) = 1560 <0.000033>
pread64(407, "\320\265\321\202.\23\1\202\4\0J\6\210\23\025044525411\0276732208"..., 8192, 16973824) = 8192 <0.000026>
sendto(10, "2.24,4085 \320\276\321\202 21.02.24 \320\272-\321\202 01"..., 8192, 0, NULL, 0) = 8192 <0.000025>
pread64(407, "\276\320\263. 59003491 \320\276\321\202 16.11.2007 \320\235"..., 8192, 16982016) = 8192 <0.000020>
 
***a lot of pread64, sendto***
 
pread64(407, "\260 (\320\234\320\221\320\236\320\243 \"\320\241\320\250 \342\204\226 11\")+40702"..., 8192, 17375232) = 8192 <0.000021>
sendto(10, "\274\320\276\320\273\320\265\320\275\321\201\320\272\320\276\320\265 \320\276\321\202\320\264\320\265\320\273\320\265\320\275"..., 8192, 0, NULL, 0) = 4000 <0.000025>
sendto(10, "\n6731030957\0\0\0\0\0\0\0\t673001001\377\377\377\377"..., 4192, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000019>
epoll_wait(4, [{EPOLLIN, {u32=1938706176, u64=93859859025664}}], 1, -1) = 1 <0.000018>
read(11, "\27\0\0\0\0\0\0\0\0\0\0\0|\373\"\0\321\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 128 <0.000020>
epoll_wait(4, 0x555d738e4f30, 1, -1)    = -1 EINTR (Interrupted system call) <4.214523>
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
kill(2292604, SIGURG)                   = 0 <0.000028>
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=1790, tv_usec=905739}}, NULL) = 0 <0.000020>
rt_sigreturn({mask=[URG]})              = -1 EINTR (Interrupted system call) <0.000018>
sendto(10, "\n6731030957\0\0\0\0\0\0\0\t673001001\377\377\377\377"..., 4192, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000023>
epoll_wait(4, [{EPOLLIN, {u32=1938706176, u64=93859859025664}}], 1, -1) = 1 <0.000017>
read(11, "\27\0\0\0\0\0\0\0\0\0\0\0|\373\"\0\321\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 128 <0.000021>
 
***First hang in strace output for  <1790.905479> seconds***
***It was looking like last line in this file***
***epoll_wait(4,  ***
 
epoll_wait(4, 0x555d738e4f30, 1, -1)    = -1 EINTR (Interrupted system call) <1790.905479>
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
kill(2292604, SIGURG)                   = 0 <0.000035>
kill(-2292604, SIGINT)                  = 0 <0.000020>
--- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=2292604, si_uid=2001} ---
rt_sigreturn({mask=[ALRM URG]})         = 0 <0.000018>
kill(2292604, SIGINT)                   = 0 <0.000017>
--- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=2292604, si_uid=2001} ---
rt_sigreturn({mask=[ALRM URG]})         = 0 <0.000016>
rt_sigreturn({mask=[URG]})              = -1 EINTR (Interrupted system call) <0.000016>
sendto(10, "\n6731030957\0\0\0\0\0\0\0\t673001001\377\377\377\377"..., 4192, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000025>
epoll_wait(4, [{EPOLLIN, {u32=1938706176, u64=93859859025664}}], 1, -1) = 1 <0.000020>
read(11, "\27\0\0\0\0\0\0\0\0\0\0\0|\373\"\0\321\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 128 <0.000018>
epoll_wait(4,
 
***last line looks incomplete and it hangs forever***
 
 
 
When I first noticed that the file was not growing, the last line was the same incomplete one
epoll_wait(4 ,
and I thought that nothing would change.
 
But later I noticed that the file had changed.
The line was formed in:
epoll_wait(4, 0x555d738e4f30, 1, -1) = -1 EINTR (Interrupted system call) <1790.905479>
telling us that epoll_wait had been hanging for 30 minutes.
 
An there were added 13 more lines with the same ending epoll_wait(4,
...but it was already final.
 
Here are the values of the possibly involved parameters:
 
                name                 | setting | unit
-------------------------------------+---------+------
 deadlock_timeout                    | 1000    | ms
 idle_in_transaction_session_timeout | 30000   | ms
 idle_session_timeout                | 60000   | ms
 statement_timeout                   | 1800000 | ms
 tcp_keepalives_count                | 4       |
 tcp_keepalives_idle                 | 30      | s
 tcp_keepalives_interval             | 5       | s
 tcp_user_timeout                    | 0       | ms
 
 
PS: found an issue in node-postgres with similar symptoms but can't figure out how their solution can be applied in my case.
 
 
Вторник, 26 марта 2024, 17:19 +03:00 от Ц <pfunk@xxxxxxx>:
 
Greetings!
I’ve faced with strange behavior when I see a lot of active sessions started hours ago while statement_timeout = '30min'.
All of them are fetching from cursors.
 
Typical session looks like:
backend_start    | 2024-03-26 14:34:20.552594+03
xact_start           | 2024-03-26 14:34:54.974628+03
query_start         | 2024-03-26 14:35:02.024133+03
state_change     | 2024-03-26 14:35:02.024134+03
wait_event_type | Client
wait_event          | ClientWrite
state                   | active
backend_xid       | 23240392
backend_xmin    | 23226474
query                   | fetch all from "<unnamed portal 20>"
backend_type     | client backend
 
 
They are accumulating up to tens by the end of the day with all negative impacts on performance.
Initially I thought that clients already died but due to network issues database considers them to be alive. So I set tcp_keepalive GUCs to nonzero values. Without success.
Then I checked connections from the app server side and found them in ESTABLISHED state.
It's certainly an application fault and it should not hold cursor forever...but
 
Is the any GUC parameters to fight with such «clients»?
 
 
 
 
 
 
 
 
 
 

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux