Re: rebellious pg stats collector (reopened case)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alvaro Herrera wrote:
Laszlo Nagy wrote:
PostgreSQL 8.3.5, the system is now stable (uptime > 10 days). PostgreSQL stats collector uses 100% CPU forever:

Could you grab a few stack traces from it and post them?  Also possibly
useful, leave strace running on the pgstat process for a while and post
the output somewhere.
Very interesting results. Before I used 'truss' on the process, it was:

78816 pgsql 1 106 0 22400K 7100K CPU6 6 24.2H 99.02% postgres

After I started "truss -p 78815" the first message I got was:

SIGNAL 17 (SIGSTOP)

and the it waited for some seconds. Then my screen was filled with messages like this:

poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) = 984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) = 984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M^X\^A\0\0\M^T\f\r\0"...,1000,0x0,NULL,0x0) = 408 (0x198)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\b\0\0\0@\0\0\0\0\0\0\0\0\0\0\0"...,1000,0x0,NULL,0x0) = 64 (0x40)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\b\0\0\0@\0\0\0\0\0\0\0\0\0\0\0"...,1000,0x0,NULL,0x0) = 64 (0x40)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\0\0\0\M^O\v\b\0\^B"...,1000,0x0,NULL,0x0) = 216 (0xd8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-x\^A\0\0\0\0\0\0\^E"...,1000,0x0,NULL,0x0) = 504 (0x1f8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\0\0\0\M^T\f\r\0\^B"...,1000,0x0,NULL,0x0) = 216 (0xd8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-x\^A\0\0\0\0\0\0\^E"...,1000,0x0,NULL,0x0) = 504 (0x1f8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\0\0\0\M^T\f\r\0\^B"...,1000,0x0,NULL,0x0) = 216 (0xd8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-x\^A\0\0\0\0\0\0\^E"...,1000,0x0,NULL,0x0) = 504 (0x1f8)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)
recvfrom(7,"\b\0\0\0@\0\0\0\0\0\0\0\0\0\0\0"...,1000,0x0,NULL,0x0) = 64 (0x40)
poll({7/POLLIN|POLLERR},1,2000)             = 1 (0x1)

I terminated the debugger with Ctrl+C. Then I started it again:


#truss -p 78815
SIGNAL 17 (SIGSTOP)
poll({7/POLLIN|POLLERR},1,2000)                  = 0 (0x0)
getppid(0x1,0x1,0x7d0,0x8014606cc,0xffffffff80a579c0,0x7fffffffceb8) = 95860 (0x17674)
open("global/pgstat.tmp",O_WRONLY|O_CREAT|O_TRUNC,0666) = 4 (0x4)
fstat(4,{mode=-rw------- ,inode=73223196,size=0,blksize=4096}) = 0 (0x0)
write(4,"\M^W\M-<\M-%\^A\^F\n\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\M-Z\^C\0\0\0\0\0\0\M-d\M^E\M-0u"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0+\0\0\0\0\0\0\0\^N\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\^C\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0T\^T\^N\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\^R\M-(\M^[\v\M-^\M-0A\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\^N\0\0\0\0\0\0\0\\n\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0T\M-A\n\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0~\M-d\M-fp\n\M-^\M-0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\^E\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"$\0\0\0\0\0\0+\^V\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0Z\M-F%"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0|\^B\0\0\0\0\0\0<\M-I\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0T\M-f\t\^V\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\M-H'\M-4\M-N\n\M-^\M-0A"...,4096) = 4096 (0x1000)
write(4,"\0\0\^N\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\b\0\0\0\0\0L\^B\0\0\0\0\0\0\M^^"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0dD\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\09\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0>\M-h\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0T%;\r\0\0\0\0\0 \0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\M-I\\j\n\M-^\M-0A\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0Y\240\^A\0\0\0\0\0\^C\M-3"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\^N\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0P\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\^A\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\^P\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\M-P\^R\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,3119) = 3119 (0xc2f)
close(4)                     = 0 (0x0)
rename("global/pgstat.tmp","global/pgstat.stat") = 0 (0x0)
process exit, rval = 0

I did NOT send the SIGSTOP signal to the process. The pgstat collector was running at 100%CPU for 24 hours, when I attached the debugger, and attaching the debugger somehow caused the process to get a SIGSTOP, but the process did not exit. When I started the debugger for the second time, it got a SIGSTOP again, and this made the collector exiting the endless loop (instead of stopping it).

What is the next step?

  Laszlo


--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux