Re: Limitations on 7.0.3?

"ARTEAGA Jose" <Jose.Arteaga@xxxxxxxxxxxxxxxxxx> · Mon, 16 Jul 2007 22:43:42 -0500

I have spent the last month battling and looking deeper into the issue,
here's a summary of were I'm at:
- Increasing shared buffers improved performance but did not resolve the
backend FATAL disconnect error. 
- Dumping and recreating entire database also did not resolve the issue.

- re-initializing the DB and recreating from the dump also did not
resolve the issue. 
On both cases above the issue re-occurred within 2-3 days of run-time
(insert of new records). 

I got the issue narrowed down to the point were I was able to re-create
the issue at will by just inserting enough data, the data content did
not matter. The issue always occurred while inserting into my
"teststeprun" table, which is the largest of my tables (~15 Mill rows).
The issue is that once I got this table to a certain size, then the
backend system would crash. 

Since I was able to reproduce, I then decided to analyze the core dumps.
Looking at the core dumps I immediately began to see a pattern, even the
same patter was there from the initial core dumps I had when the problem
began occurring back two months ago. In every case the dump indicated
the last instruction was always in the call to tag_hash(). I also
noticed that each time the values passed to tag_hash which are used to
generate the key were just below the 32-bit max value, and tag_hash
should be returning a uint32 value. Now I'm really suspecting that there
is some issue with this. Below are the traces of the four core dumps
which point to the issue I'm suspecting. 

This is likely already resolved in new versions, which BTW I am planning
to upgrade to the 8.2 in the coming months. But I'd still like to know
if anyone out there knows about this issue and could confirm my
suspicions. 

DUMP1
core 'pg_core.20070603' of 22757:
/home/ttadmin/postgresql/bin/postmaster -D /home/ttadmin/postgresql/da
 001db000 tag_hash (ff4001d8, c, 1daff4, fef8ccd0, fef8d00c, 1) + c
 001da134 call_hash (263370, ff4001d8, 0, 1, 301688, 300740) + 2c
 001da1f8 hash_search (263370, ff4001d8, 0, ff40015b, 301688, 300740) +
64
 00144ae8 BufTableLookup (ff4001d8, ff4001d8, 125aa000, 244c00, 92d4,
fefe3c) + 3c
 001458e4 BufferAlloc (2d23e8, 3, ff40027f, 0, 92d3, ff1764) + 94
 00145420 ReadBufferWithBufferLock (2d23e8, 3, 0, 0, 92d2, ff00dc) + f0
 00145314 ReadBuffer (2d23e8, 3, 0, fef8d360, fef8e020, ff001c) + 18
 0005fdec _bt_getbuf (2d23e8, 3, 2, 10000, fef8ecb0, 3) + 28
 000601fc _bt_getstackbuf (2d23e8, 3003c8, 2, fef8ccd0, fef8d00c, 1) +
28
 0005bc8c _bt_insertonpg (2d23e8, 1, 3003c8, 1, 301688, 300740) + d04

DUMP2
core 'core' of 9640:    /home/ttadmin/postgresql/bin/postmaster -B 18000
-D /home/ttadmin/post
 001db000 tag_hash (ff4001c8, c, 1daff4, fd4f13d0, fd4f170c, 1) + c
 001da134 call_hash (262b50, ff4001c8, 0, 1, 465cf8, 464db0) + 2c
 001da1f8 hash_search (262b50, ff4001c8, 0, ff40014b, 465cf8, 464db0) +
64
 00144ae8 BufTableLookup (ff4001c8, ff4001c8, 124f6000, 244c00, 927a,
10efd64) + 3c
 001458e4 BufferAlloc (428980, 3, ff40026f, 0, 9279, 10efd7c) + 94
 00145420 ReadBufferWithBufferLock (428980, 3, 0, 0, 9278, 10efd94) + f0
 00145314 ReadBuffer (428980, 3, 0, fd4f1a60, fd4f2720, 10efdac) + 18
 0005fdec _bt_getbuf (428980, 3, 2, 10000, fd4f33b0, 3) + 28
 000601fc _bt_getstackbuf (428980, 464a38, 2, fd4f13d0, fd4f170c, 1) +
28

DUMP3
core 'core' of 10466:   /home/ttadmin/postgresql/bin/postmaster -B 18000
-D /home/ttadmin/post
 001db000 tag_hash (ff4001c8, c, 1daff4, fd6093d0, fd60970c, 1) + c
 001da134 call_hash (262b50, ff4001c8, 0, 1, 463be0, 462c98) + 2c
 001da1f8 hash_search (262b50, ff4001c8, 0, ff40014b, 463be0, 462c98) +
64
 00144ae8 BufTableLookup (ff4001c8, ff4001c8, 124f6000, 244c00, 927a,
10ef044) + 3c
 001458e4 BufferAlloc (428848, 3, ff40026f, 0, 9279, 10ef05c) + 94
 00145420 ReadBufferWithBufferLock (428848, 3, 0, 0, 9278, 10ef074) + f0
 00145314 ReadBuffer (428848, 3, 0, fd609a60, fd60a720, 10ef08c) + 18
 0005fdec _bt_getbuf (428848, 3, 2, 10000, fd60b3b0, 3) + 28
 000601fc _bt_getstackbuf (428848, 462920, 2, fd6093d0, fd60970c, 1) +
28

DUMP4
core 'core' of 12400:   /home/ttadmin/postgresql/bin/postmaster -B 18000
-D /home/ttadmin/post
 001db000 tag_hash (ff4001f0, c, 1daff4, 19c7, ce, 1) + c
 001da134 call_hash (262b50, ff4001f0, 0, 22, ffffffff, 455bd0) + 2c
 001da1f8 hash_search (262b50, ff4001f0, 0, ff400173, f6000000, 0) + 64
 00144ae8 BufTableLookup (ff4001f0, ff4001f0, 0, 7, 246000, 1146554) +
3c
 001458e4 BufferAlloc (4273d8, ffffffff, ff400297, 0, 0, 0) + 94
 00145420 ReadBufferWithBufferLock (4273d8, ffffffff, 0, 0, 0, 0) + f0
 00145314 ReadBuffer (4273d8, ffffffff, 6, 260000, 0, 0) + 18
 0005fe2c _bt_getbuf (4273d8, ffffffff, 2, 455f18, f9583a60, ffbaa0e8) +
68
 0005cb8c _bt_split (4273d8, 1, 456c48, 19c7, ce, 1) + 28
 0005bd38 _bt_insertonpg (4273d8, 19c7, 455988, 1, 456c48, 455bd0) + db0

Rgds/
Jose Arteaga

-----Original Message-----
From: ARTEAGA Jose 
Sent: Thursday, June 07, 2007 12:32 PM
To: Alvaro Herrera; Richard Huxton
Cc: pgsql-general@xxxxxxxxxxxxxx
Subject: RE: [GENERAL] Limitations on 7.0.3?

I've looked at the pg_index table and we are currently at 15Mill
entries, which should be OK. After 2-3 days runtime I just get a
disconnect error from backend while doing an insert. After I restore the
DB and insert the same entries it runs fine. Following is the error I
get:
"Query pgsql8.1: PGRES_FATAL_ERROR, pqReadData() -- backend closed the
channel unexpectedly.\n\tThis probably means the backend terminated
abnormally\n\tbefore or while processing the request.\n
      INSERT INTO teststeprun (description, stepunqid, stepid,
stepstarttime, duration, parentsn, stepresult) VALUES ('Qos Reset',
'24920757/3267', ' 95.  4', '2007-06-02 19:02:05', '0 0:0:25',
'5311955', '0')"  

Also worth mentioning is that I just this week found out about a very,
very important parameter "shared buffers". Ever since the original
person setup our PG (individual no longer with us) this DB had been
running without any major glitches, albeit slow. All this time the
shared buffers were running at default of "64" (8192 block size). Once I
have got this back up and running I have since set this to 1600 shared
buffers (~125MB). I've since noticed a dramatic performance improvement,
I hope that I've striked gold. But cannot claim victory yet it's only
been up for 2 days. 

-Jose

-----Original Message-----
From: Alvaro Herrera [mailto:alvherre@xxxxxxxxxxxxxxxxx] 
Sent: Thursday, June 07, 2007 7:02 AM
To: Richard Huxton
Cc: ARTEAGA Jose; pgsql-general@xxxxxxxxxxxxxx
Subject: Re: [GENERAL] Limitations on 7.0.3?

Richard Huxton wrote:
> ARTEAGA Jose wrote:
> >I have been using postgres 7.0.3 (Solaris 2.9) for the past 4 years
> >without any major problems, until about a month ago. We are now
> >experiencing crashes on the backend (connection lost to backend)
while
> >running queries (inserts, etc). Anyway I don't want to make this too
> >technical but I'd just like to understand if maybe I'm running into
some
> >sort of limit on the size of the database. My biggest table is
currently
> >at 1.5B tuples.
> >
> >Would appreciate if anyone could let me know or is aware of any
limits
> >with 7.0 version. 
> 
> I don't remember any specific size limitations on the 7.0 series. For 
> more detailed help you'll have to provide some specific error
messages.

7.0 didn't have any protection against Xid wraparound.  As soon as you
hit the 4 billion transactions mark, your data suddenly disappeared.
That's what I heard at least -- I didn't have much experience with such
an old beast.  We switched rather quickly to 7.1.

-- 
Alvaro Herrera
http://www.amazon.com/gp/registry/CTMLCN8V17R4
"Before you were born your parents weren't as boring as they are now.
They
got that way paying your bills, cleaning up your room and listening to
you
tell them how idealistic you are."  -- Charles J. Sykes' advice to
teenagers