Hi Jeff,
Thanks a lot for such a detailed guide!
于 2013/8/21 8:24, Jeff Janes 写道:
OK, that certainly does sound like network problems and not disk
contention. But what I don't see is why it would be listed as
"active" in pg_stat_activity. If it is blocked on a network
connection, I would think it would show 'idle'.
strace and gdb are aggressive forms of monitoring, so you should talk
to your sys admins before using them (if your organization has that
type of segregation of duties) or read their documentation.
It is best to run them on a test system if you can get the problem to
occur there. If you can't, then I would be willing to run them
against a hung backend on a production system, but I might be more
adventurous than most.
I have occasionally seen strace cause the straced program to
seg-fault, but I've only seen this with GUI programs and never with
postgresql.
In theory attaching gdb could make a backend pause while holding a
spinlock, and if left that way for long enough could cause a
postgresql-wide panic over a stuck spinlock. But I've never seen this
happen without intentionally causing. If the backend you attach is
truly hung and hasn't already causes a panic, then it almost surely
can't be happen, but just to be sure you should quit gdb as soon as
possible so that the backend can continue unimpeded.
Anyway, both are fairly easy to run once you find the pid of one of a
stuck backend (either using ps, or using pg_stat_activity). Then you
give the pid to the debugging program with -p option.
You probably have to run as the postgres user, or you won't have
permissions to attach to the backend.
With strace, once you attach you will see a stream of system calls go
to your screen, until you hit ctrl-C to detach. But if the backend is
hung on a network connection, there should really only be one system
call, and then it just wait until you detach or the network connection
times out, like this:
$ strace -p 21116
Process 21116 attached - interrupt to quit
recvfrom(9,
So it is waiting on a recvfrom call, and that call never returns until
I got sick of waiting and hit ctrl-C. Not very interesting, but it
does show it is indeed stuck on the network
for gdb, it is similar to invoke:
$ gdb -p 21116
and it then produces several screenfuls of diagnostic gibberish and
gives you an interactive command-line environment. Once attached, you
want to get a backtrace ("bt", return), and then quit promptly ("q",
return, "y").
That produces something like this:
Loaded symbols for /lib64/libnss_files.so.2
0x00000032a80e9672 in __libc_recv (fd=<value optimized out>,
buf=0xb33f60, n=8192, flags=0) at
../sysdeps/unix/sysv/linux/x86_64/recv.c:30
30 return INLINE_SYSCALL (recvfrom, 6, fd, buf, n, flags,
NULL, NULL);
(gdb) bt
#0 0x00000032a80e9672 in __libc_recv (fd=<value optimized out>,
buf=0xb33f60, n=8192, flags=0) at
../sysdeps/unix/sysv/linux/x86_64/recv.c:30
#1 0x00000000005a4846 in secure_read (port=0x22f1190, ptr=0xb33f60,
len=8192) at be-secure.c:304
#2 0x00000000005ae33b in pq_recvbuf () at pqcomm.c:824
#3 0x00000000005ae73b in pq_getbyte () at pqcomm.c:865
#4 0x0000000000651c11 in SocketBackend (argc=<value optimized out>,
argv=<value optimized out>, dbname=0x22d2a28 "jjanes", username=<value
optimized out>)
at postgres.c:342
#5 ReadCommand (argc=<value optimized out>, argv=<value optimized
out>, dbname=0x22d2a28 "jjanes", username=<value optimized out>) at
postgres.c:490
#6 PostgresMain (argc=<value optimized out>, argv=<value optimized
out>, dbname=0x22d2a28 "jjanes", username=<value optimized out>) at
postgres.c:3923
#7 0x000000000060e861 in BackendRun () at postmaster.c:3614
#8 BackendStartup () at postmaster.c:3304
#9 ServerLoop () at postmaster.c:1367
#10 0x00000000006113b1 in PostmasterMain (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:1127
#11 0x00000000005b0440 in main (argc=3, argv=0x22d1cb0) at main.c:199
(gdb) q
A debugging session is active.
Inferior 1 [process 21116] will be detached.
Quit anyway? (y or n) y
Detaching from program: /usr/local/pgsql9_2/bin/postgres, process 21116
Cheers,
Jeff
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance