Re: [HACKERS] ANALYZE getting dead tuple count hopelessly wrong

Stuart Brooks <stuartb@xxxxxxxxx> · Mon, 07 Apr 2008 10:04:51 +0200

Pavan Deolasee wrote:
On Fri, Apr 4, 2008 at 11:10 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

 The
 policy of this project is that we only put nontrivial bug fixes into
 back branches, and I don't think this item qualifies ...

Got it. I will submit a patch for HEAD.

Thanks,
As I mentioned earlier, I patched 8.3.1 with Pavan's patch and have been 
running tests. After a few days I have got postgres to lock up - not 
sure if it is related. Below is a ps from my system (NetBSD 3).

TEST> ps -ax | grep post
1952 ?      IW<s  13:52.24 postgres: writer process
2113 ?      S<s    0:03.04 postgres: logger process
2157 ?      S<s    0:03.12 postgres: autovacuum launcher process
2199 ?      I<s    0:00.04 postgres: metauser metadb [local] SELECT
2472 ?      DW<s 814:23.50 postgres: metauser metadb localhost(65524) 
COMMIT
2661 ?      DW<s   0:11.27 postgres: metauser metadb localhost(65525) idle
2680 ?      S<s    1:18.75 postgres: stats collector process
3156 ?      S<s    0:45.12 postgres: wal writer process
24362 ?      IW<s   0:00.00 postgres: autovacuum worker process
25024 ?      IW<s   0:00.00 postgres: autovacuum worker process
25134 ?      IW<s   0:00.00 postgres: autovacuum worker process
3289 ttyp5  I<     0:01.96 /usr/local/pgsql/bin/postgres -D ../data/metadb

and I was disconnected in my client app with the following message:

[WARN] PSQL:exec - failed in command <SELECT 
relname,n_tup_ins,n_live_tup,n_dead_tup,pg_total_relation_size('s8_0000.' 
|| relname)*10/(1024*1024),last_autovacuum FROM pg_stat_user_tables 
WHERE schemaname='s8_0000' ORDER BY n_tup_ins DESC>
[WARN]   error = 'server closed the connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.'
[WARN] ConnectionNB: PQconsumeInput failed with error 'server closed the 
connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.'

The server is still running but I can't access it. A top yields:

load averages:  0.23,  0.23,  0.21    09:53:58
110 processes: 109 sleeping, 1 on processor

Memory: 513M Act, 256M Inact, 1336K Wired, 75M Exec, 557M File, 2776K Free
Swap: 600M Total, 600M Free

 PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
 463 root       2    0  6132K   14M select     0:06  0.05%  0.05% kdeinit
2472 postgres -22   -2  4580K    4K mclpl    814:23  0.00%  0.00% <postgres>
2631 root     -22    0   644K    4K mclpl    606:25  0.00%  0.00% <test_writer
1622 root       2    0  8456K   14M select    19:05  0.00%  0.00% kdeinit
1952 postgres   2   -2  3544K    4K netlck    13:52  0.00%  0.00% <postgres>
 233 root       2    0    24M   31M select     4:47  0.00%  0.00% XFree86
 451 root       2    0  3544K   15M select     4:45  0.00%  0.00% kdeinit
  16 root      18    0     0K  182M syncer     3:51  0.00%  0.00% [ioflush]
  17 root     -18    0     0K  182M aiodoned   1:46  0.00%  0.00% [aiodoned]
  15 root     -18    0     0K  182M pgdaemon   1:30  0.00%  0.00% [pagedaemon]
1301 root     -22    0  4092K    4K mclpl      1:23  0.00%  0.00% <kdeinit>
2680 postgres   2   -2  3560K 1588K poll       1:18  0.00%  0.00% postgres
1493 root       2    0  3488K   17M select     1:09  0.00%  0.00% korgac
 461 root       2    0  3748K   16M select     0:57  0.00%  0.00% kdeinit
3156 postgres   2   -2  3448K 1792K select     0:45  0.00%  0.00% postgres
1174 root       2    0  2608K 2928K select     0:40  0.00%  0.00% profiler
1428 root       2    0  3376K   13M select     0:26  0.00%  0.00% kdeinit
2661 postgres -22   -2  4896K    4K mclpl      0:11  0.00%  0.00% <postgres>

I'm not convinced this is a postgresql bug (state=mclpl concerns me), 
but it's the first time I've seen it. I suppose it could be: 
http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=35224.

Anything I can do which might help isolating the problem?

Regards
Stuart