Re: Maintenance question / DB size anomaly...

Kurt Overberg <kurt@xxxxxxxxxxxxxxxxx> · Wed, 20 Jun 2007 20:43:57 -0400

Okay,

 select * from _my_cluster.sl_log_1 where ctid = '(1,1)';
 select * from _my_cluster.sl_log_1 where ctid = '(1,2)';
 select * from _my_cluster.sl_log_1 where ctid = '(1,3)';
 select * from _my_cluster.sl_log_1 where ctid = '(1,4)';

all returns zero rows.  When I do a dump of that file, I get:

Block    1 ********************************************************
<Header> -----
Block Offset: 0x00002000         Offsets: Lower     408 (0x0198)
Block: Size 8192  Version    2            Upper    7680 (0x1e00)
LSN:  logid    955 recoff 0x0daed68c      Special  8192 (0x2000)
Items:   97                   Free Space: 7272
Length (including item array): 412

<Data> ------
Item   1 -- Length:  121  Offset: 8068 (0x1f84)  Flags: USED
  XMIN: 1491480520  CMIN: 1  XMAX: 0  CMAX|XVAC: 0
  Block Id: 1  linp Index: 1   Attributes: 6   Size: 32
  infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID)

...the fact that they weren't in the table, but in the file (I did  
the filedump first,
then the query), then redid the filedump, the results are the same,  
the rows are still
in the file.  I have no idea how frequently these files are getting  
written to, I assume
frequently.  I also looked at the last block listed in the file,  
6445, and also looked for
items 1-4, and also did not find them in the table using a similar  
select as above.  That seems
kinda strange, since there's right this second 11,000 items in that  
table, but I'll roll with it for awhile.

Intrigued, I wanted to see what a filedump looked like of a row that  
WAS in the table:

ctid   | log_origin |  log_xid   | log_tableid | log_actionseq |  
log_cmdtype |
 (7,1) |         10 | 1491481037 |           8 |     473490934 |  
I           | (memberid,answerid,taskinstanceid) values  
('144854','148707','0')

Block    7 ********************************************************
<Header> -----
Block Offset: 0x0000e000         Offsets: Lower     424 (0x01a8)
Block: Size 8192  Version    2            Upper     508 (0x01fc)
LSN:  logid    955 recoff 0x0dc4bcc0      Special  8192 (0x2000)
Items:  101                   Free Space:   84
Length (including item array): 428

<Data> ------
Item   1 -- Length:  129  Offset: 8060 (0x1f7c)  Flags: USED
  XMIN: 1491481037  CMIN: 7  XMAX: 0  CMAX|XVAC: 0
  Block Id: 7  linp Index: 1   Attributes: 6   Size: 32
  infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID)

...the NextID was (taken about 5 mins after the previous filedump):
 Latest checkpoint's NextXID:          1491498183

I don't see any real differences in the file entry for a row that is  
in the table, and one that I
don't see in the table.  I hope I'm getting this right, its totally  
fascinating seeing how
all this works.

About your other questions:

1.  I have pg_clog segments all the way back to the start of the  
database, all the way back
to March 14th, 2006 (most likely when the database was first brought  
up on this machine).
The numeric names start at 0000 and go to 058E.  I checked the recent  
(within last 8 days)
and saw no errors containing the word 'checkpoint'.  In fact, very  
few errors at all.
The dang thing seems to be running pretty well, just a little slow.

mydb=# select datname, age(datfrozenxid) from pg_database;
  datname  |    age
-----------+------------
template1 | 1491520270
template0 | 1491520270
postgres  | 1491520270
mydb      | 1076194904

Oooooooo..... thats not good, is it?  Thanks for taking an interest,  
Tom.  I'm most likely going to
promote one of my subscribers to be master, then nuke this database,  
but I have no problems keeping it
around if you think I may have found some obscure bug that could help  
someone debug.  Again, this
DB gets vacuumed every day, and in the beginning, I think I remember  
doing a vacuum full every
day.

Thanks,

/kurt

On Jun 20, 2007, at 5:08 PM, Tom Lane wrote:

Kurt Overberg <kurt@xxxxxxxxxxxxxxxxx> writes:
Drat!  I'm wrong again.  I thought for sure there wouldn't be a
wraparound problem.

Well, I'm not sure what it is now.  You showed some invisible tuples
with XMINs of
   XMIN: 1489323584  CMIN: 1  XMAX: 0  CMAX|XVAC: 0
   XMIN: 1489323590  CMIN: 2  XMAX: 0  CMAX|XVAC: 0
   XMIN: 1489323592  CMIN: 1  XMAX: 0  CMAX|XVAC: 0
but the nextXID is
         1490547335
which is not that far ahead of those --- about 1.2 million  
transactions,
or less than a day's time according to the clog timestamps, which
suggest that you're burning several million XIDs a day.  Perhaps  
you've
wrapped past them since your earlier check --- if you try the same
"select where ctid = " queries now, do they show rows?

The other thing that's strange here is that an 8.0 installation should
be pretty aggressive about recycling pg_clog segments, and yet you've
got a bunch there.  How far back do the files in pg_clog go --- what's
the numeric range of the filenames, and the date range of their mod
times?  Have you checked the postmaster log to see if you're  
getting any
complaints about checkpoint failures or anything like that?  It would
also be useful to look at the output of
select datname, age(datfrozenxid) from pg_database;

			regards, tom lane

---------------------------(end of  
broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq