Re: How to investiage slow insert problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jeff,

Thanks a lot for such a detailed guide!

于 2013/8/21 8:24, Jeff Janes 写道:
OK, that certainly does sound like network problems and not disk 
contention.  But what I don't see is why it would be listed as 
"active" in pg_stat_activity.  If it is blocked on a network 
connection, I would think it would show 'idle'.
strace and gdb are aggressive forms of monitoring, so you should talk 
to your sys admins before using them (if your organization has that 
type of segregation of duties) or read their documentation.
It is best to run them on a test system if you can get the problem to 
occur there.  If you can't, then I would be willing to run them 
against a hung backend on a production system, but I might be more 
adventurous than most.
I have occasionally seen strace cause the straced program to 
seg-fault, but I've only seen this with GUI programs and never with 
postgresql.
In theory attaching gdb could make a backend pause while holding a 
spinlock, and if left that way for long enough could cause a 
postgresql-wide panic over a stuck spinlock.  But I've never seen this 
happen without intentionally causing.  If the backend you attach is 
truly hung and hasn't already causes a panic, then it almost surely 
can't be happen, but just to be sure you should quit gdb as soon as 
possible so that the backend can continue unimpeded.
Anyway, both are fairly easy to run once you find the pid of one of a 
stuck backend (either using ps, or using pg_stat_activity).  Then you 
give the pid to the debugging program with -p option.
You probably have to run as the postgres user, or you won't have 
permissions to attach to the backend.
With strace, once you attach you will see a stream of system calls go 
to your screen, until you hit ctrl-C to detach.  But if the backend is 
hung on a network connection, there should really only be one system 
call, and then it just wait until you detach or the network connection 
times out, like this:
$ strace -p 21116
Process 21116 attached - interrupt to quit
recvfrom(9,

So it is waiting on a recvfrom call, and that call never returns until I got sick of waiting and hit ctrl-C. Not very interesting, but it does show it is indeed stuck on the network
for gdb, it is similar to invoke:

$ gdb -p 21116

and it then produces several screenfuls of diagnostic gibberish and gives you an interactive command-line environment. Once attached, you want to get a backtrace ("bt", return), and then quit promptly ("q", return, "y").
That produces something like this:

Loaded symbols for /lib64/libnss_files.so.2
0x00000032a80e9672 in __libc_recv (fd=<value optimized out>, buf=0xb33f60, n=8192, flags=0) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:30 30 return INLINE_SYSCALL (recvfrom, 6, fd, buf, n, flags, NULL, NULL);
(gdb) bt
#0 0x00000032a80e9672 in __libc_recv (fd=<value optimized out>, buf=0xb33f60, n=8192, flags=0) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:30 #1 0x00000000005a4846 in secure_read (port=0x22f1190, ptr=0xb33f60, len=8192) at be-secure.c:304
#2  0x00000000005ae33b in pq_recvbuf () at pqcomm.c:824
#3  0x00000000005ae73b in pq_getbyte () at pqcomm.c:865
#4 0x0000000000651c11 in SocketBackend (argc=<value optimized out>, argv=<value optimized out>, dbname=0x22d2a28 "jjanes", username=<value optimized out>)
    at postgres.c:342
#5 ReadCommand (argc=<value optimized out>, argv=<value optimized out>, dbname=0x22d2a28 "jjanes", username=<value optimized out>) at postgres.c:490 #6 PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x22d2a28 "jjanes", username=<value optimized out>) at postgres.c:3923
#7  0x000000000060e861 in BackendRun () at postmaster.c:3614
#8  BackendStartup () at postmaster.c:3304
#9  ServerLoop () at postmaster.c:1367
#10 0x00000000006113b1 in PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1127
#11 0x00000000005b0440 in main (argc=3, argv=0x22d1cb0) at main.c:199
(gdb) q
A debugging session is active.

        Inferior 1 [process 21116] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/local/pgsql9_2/bin/postgres, process 21116

Cheers,

Jeff


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux