I've done some more testing and the problem seems to be repmgr itself. A few details below... ----- Original Message ----- > From: Greg Williamson <gwilliamson39@xxxxxxxxx> > To: Tom Lane <tgl@xxxxxxxxxxxxx> > Cc: "pgsql-admin@xxxxxxxxxxxxxx" <pgsql-admin@xxxxxxxxxxxxxx> > Sent: Thursday, September 27, 2012 7:23 PM > Subject: Re: Database size stays constant but disk space keeps shrinking -- postgres 9.1 > >T om -- > > ----- Original Message ----- >> From: Tom Lane <tgl@xxxxxxxxxxxxx> >> To: Greg Williamson <gwilliamson39@xxxxxxxxx> >> Cc: "pgsql-admin@xxxxxxxxxxxxxx" > <pgsql-admin@xxxxxxxxxxxxxx> >> Sent: Thursday, September 27, 2012 7:14 PM >> Subject: Re: Database size stays constant but disk space keeps > shrinking -- postgres 9.1 >> >> G reg Williamson <gwilliamson39@xxxxxxxxx> writes: >>>> Have you checked to see if there are any processes that have open >> handles to >>>> deleted files (lsof -X | grep deleted). >> >>> lsof -X | grep deleted | wc -l >> >>> shows: 835 such files. >> >>> A couple: >>> postgres 2540 postgres 50u REG 8,3 409600 > >> 93429 /var/lib/postgresql/9.1/main/base/2789 >>> 200/11816 (deleted) >>> postgres 2540 postgres 51u REG 8,3 18112512 > >> 49694570 /var/lib/postgresql/9.1/main/base/2789 >>> 200/2791679 (deleted) >>> <...> >> >> So, which processes are holding these open, and what are they doing >> exactly? Let's see output from ps and pg_stat_activity, maybe even >> attach to them with gdb and get stack traces. >> >>> We've a planned restart scheduled soon which will let me find any >>> scripts that might be keeping things open, >> >> A restart will destroy all the evidence, so let's not be in a hurry >> to do that before we've identified what's happening. >> >> regards, tom lane >> > > Thanks for the suggestions -- I'll post back when I have more info. Many of > these do not seem to have a link to any identifiable process that is still > running, but some do and they have pointed me away from the hourly drop / > rebuild, at least for now. Looks like the stats database may be the issue. > > Greg W. I turned off the cronjob that did the hourly database create / drop and am still leaking disk space, but a but slower -- only lost 2 gigs overnight. While running this process I see these data directories: postgres@db11:~$ ls -lrt 9.1/main/base total 200 drwx------ 2 postgres postgres 6 2012-09-21 16:36 pgsql_tmp drwx------ 2 postgres postgres 8192 2012-10-01 00:26 16387 drwx------ 2 postgres postgres 16384 2012-10-01 00:26 1418400 drwx------ 2 postgres postgres 8192 2012-10-01 00:26 2047839 drwx------ 2 postgres postgres 8192 2012-10-01 00:26 11946 drwx------ 2 postgres postgres 8192 2012-10-01 00:27 16449 drwx------ 2 postgres postgres 8192 2012-10-01 00:27 16392 drwx------ 2 postgres postgres 8192 2012-10-01 00:27 16402 drwx------ 2 postgres postgres 8192 2012-10-01 00:27 11938 drwx------ 2 postgres postgres 8192 2012-10-01 00:27 1 drwx------ 2 postgres postgres 8192 2012-10-01 08:17 16424 drwx------ 2 postgres postgres 32768 2012-10-01 19:20 3171846 When it is done (note the last directory is now gone): postgres@db11:~$ ls -lrt 9.1/main/base total 140 drwx------ 2 postgres postgres 6 2012-09-21 16:36 pgsql_tmp drwx------ 2 postgres postgres 8192 2012-10-01 00:26 16387 drwx------ 2 postgres postgres 16384 2012-10-01 00:26 1418400 drwx------ 2 postgres postgres 8192 2012-10-01 00:26 2047839 drwx------ 2 postgres postgres 8192 2012-10-01 00:26 11946 drwx------ 2 postgres postgres 8192 2012-10-01 00:27 16449 drwx------ 2 postgres postgres 8192 2012-10-01 00:27 16392 drwx------ 2 postgres postgres 8192 2012-10-01 00:27 16402 drwx------ 2 postgres postgres 8192 2012-10-01 00:27 11938 drwx------ 2 postgres postgres 8192 2012-10-01 00:27 1 drwx------ 2 postgres postgres 8192 2012-10-01 08:17 16424 When I run lsof -X and grep for deleted files I see these 4 new entries added since the last database create/drop: ase/3167420/3169915 (deleted) postgres 21116 postgres 66u REG 8,3 19709952 136501576 /var/lib/postgresql/9.1/main/base/3171846/3174279 (deleted) postgres 21116 postgres 67u REG 8,3 15450112 136501574 /var/lib/postgresql/9.1/main/base/3171846/3174278 (deleted) postgres 21116 postgres 68u REG 8,3 28344320 136410873 /var/lib/postgresql/9.1/main/base/3171846/3172541 (deleted) postgres 21116 postgres 69u REG 8,3 82452480 144333458 /var/lib/postgresql/9.1/main/base/3171846/3174341 (deleted) root@db11:~# root@db11:~# ps auxww | grep 21116 postgres 21116 0.0 0.1 100416 32332 ? Ss 00:26 0:16 postgres: repmgr repmgr 199.9.xxx.yyy(45239) idle root 25755 0.0 0.0 6440 840 pts/2 S+ 19:38 0:00 grep --color=auto 21116 ====== With the database create/drop suspended we still see a steady accumulation of dead file descriptors, but at a slower rate. < /dev/sda3 67G 28G 39G 42% / --- > /dev/sda3 67G 29G 38G 44% / Other than abandoning repmgr I don't see a solution. I've posted this to the repmgr discussion group but have had zero responses (and, frankly, am not holding my breath). If anyone has any suggestions I'm all ears. Thanks for the bandwidth! Greg W. -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin