Re: rebellious pg stats collector (reopened case)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



OK, here is the new test.

shopzeus# ps l -p 39766 -p 39767 -p 39769
 UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
70 39766 78806 0 96 0 451960 423896 select Ss ?? 8:41.85 postgres: writer process (postgres) 70 39767 78806 0 96 0 451960 7184 select Ss ?? 0:58.75 postgres: wal writer process (postgres) 70 39769 78806 0 96 0 22400 7008 select Ss ?? 14:42.39 postgres: stats collector process (postgres)


top -otime 10


last pid: 7984; load averages: 1.38, 1.51, 1.49 up 16+18:18:22 05:02:20
170 processes: 3 running, 161 sleeping, 1 stopped, 5 zombie
CPU states: % user, % nice, % system, % interrupt, % idle
Mem: 959M Active, 5947M Inact, 562M Wired, 275M Cache, 214M Buf, 164M Free
Swap: 16G Total, 1088K Used, 16G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 78806 pgsql 1 98 0 441M 6960K select 4 67:14 0.39% postgres 39769 pgsql 1 96 0 22400K 7008K select 7 14:43 0.00% postgres 39766 pgsql 1 96 0 441M 414M select 3 8:42 0.00% postgres
10602 root             1  96    0   149M 18916K select 4   5:30  0.00% httpd
67535 root 1 4 0 3720K 1336K kqread 0 3:54 0.00% dovecot 795 root 1 4 0 8188K 1452K kqread 4 3:21 0.00% master 95791 shopzeus 3 116 20 59892K 17440K select 3 2:22 0.00% python
32397 postfix          1   4    0  8280K  1980K kqread 6   1:23  0.00% qmgr
67536 root 1 4 0 5996K 1448K kqread 4 1:07 0.00% dovecot-auth 525 root 1 96 0 4684K 1156K select 3 1:03 0.00% syslogd

In other words, the stats collector is running since 14 hours.

shopzeus# date
Mon Dec 22 05:03:24 EST 2008
shopzeus# ls -l ~pgsql/data/global/pgstat.stat
-rw------- 1 pgsql pgsql 232084 Dec 22 05:03 /usr/local/pgsql/data/global/pgstat.stat

Looks like it is updating the stats file continuously. Now I try to trace:


shopzeus# truss -p 39769
poll({7/POLLIN|POLLERR},1,2000)                  = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) = 984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000)                  = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) = 984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000)                  = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) = 984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000)                  = 1 (0x1)
recvfrom(7,"\^A\0\0\0X\^B\0\0\M^T\f\r\0\^F\0"...,1000,0x0,NULL,0x0) = 600 (0x258) poll({7/POLLIN|POLLERR},1,2000) ERR#4 'Interrupted system call'
SIGNAL 14 (SIGALRM)
sigreturn(0x7fffffffca70,0x0,0x7fffffffca70,0xffffffff809fb5c0,0x7fffffffffc0,0x7fffffffca60) ERR#4 'Interrupted system call'
getppid(0x1,0x1,0x801901120,0x4,0x7fffffffd39f,0x6) = 11204 (0x2bc4)
open("global/pgstat.tmp",O_WRONLY|O_CREAT|O_TRUNC,0666) = 4 (0x4)
fstat(4,{mode=-rw------- ,inode=73223197,size=0,blksize=4096}) = 0 (0x0)
write(4,"\M^W\M-<\M-%\^A`\^B\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"l\0\0\0\0\0\0\0\M-T\M-s.\M-`\M-r"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\t\0\0\0\0\0\0\0\^C"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\^C\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0T\^T\^N\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\M-aKo\M-s\M-a\M-0A\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\^C\0\0\0\0\0\0\0\M-&"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0T\M-A\n\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\M-_o\M-T\M-u\M-p"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\^E\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\^A\0\0\0\0\0\0\^?\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0X\M-<"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\b\^D@f\0\0\0\0\0\0\0F \0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0T\M-2\f\b\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0;\0\0\0\0\0\0\0\^R\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\n\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0T_\n\0\0\0\0\0\0\M-*R\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"r\M^W\M-`\M-p\M-a\M-0A\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\M-a\M-p\M-a\M-0A\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0c\^Q\^A\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\^F\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0Z\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\^D\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,2708) = 2708 (0xa94)
close(4)                     = 0 (0x0)
rename("global/pgstat.tmp","global/pgstat.stat") = 0 (0x0)
process exit, rval = 0

And the stats collector exited! It cannot be by accident. Last time I checked, it had the same problem: tried to poll something. It was in an inifinte loop, polling something but always got POLERR. Then because I started to trace it, it exited with rval=0 or rval=1.

I do not understand why it exited after I started to trace it? Is this some kind of bug?

Thanks,

  Laszlo


--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux