I have spent the last month battling and looking deeper into the issue, here's a summary of were I'm at: - Increasing shared buffers improved performance but did not resolve the backend FATAL disconnect error. - Dumping and recreating entire database also did not resolve the issue. - re-initializing the DB and recreating from the dump also did not resolve the issue. On both cases above the issue re-occurred within 2-3 days of run-time (insert of new records). I got the issue narrowed down to the point were I was able to re-create the issue at will by just inserting enough data, the data content did not matter. The issue always occurred while inserting into my "teststeprun" table, which is the largest of my tables (~15 Mill rows). The issue is that once I got this table to a certain size, then the backend system would crash. Since I was able to reproduce, I then decided to analyze the core dumps. Looking at the core dumps I immediately began to see a pattern, even the same patter was there from the initial core dumps I had when the problem began occurring back two months ago. In every case the dump indicated the last instruction was always in the call to tag_hash(). I also noticed that each time the values passed to tag_hash which are used to generate the key were just below the 32-bit max value, and tag_hash should be returning a uint32 value. Now I'm really suspecting that there is some issue with this. Below are the traces of the four core dumps which point to the issue I'm suspecting. This is likely already resolved in new versions, which BTW I am planning to upgrade to the 8.2 in the coming months. But I'd still like to know if anyone out there knows about this issue and could confirm my suspicions. DUMP1 core 'pg_core.20070603' of 22757: /home/ttadmin/postgresql/bin/postmaster -D /home/ttadmin/postgresql/da 001db000 tag_hash (ff4001d8, c, 1daff4, fef8ccd0, fef8d00c, 1) + c 001da134 call_hash (263370, ff4001d8, 0, 1, 301688, 300740) + 2c 001da1f8 hash_search (263370, ff4001d8, 0, ff40015b, 301688, 300740) + 64 00144ae8 BufTableLookup (ff4001d8, ff4001d8, 125aa000, 244c00, 92d4, fefe3c) + 3c 001458e4 BufferAlloc (2d23e8, 3, ff40027f, 0, 92d3, ff1764) + 94 00145420 ReadBufferWithBufferLock (2d23e8, 3, 0, 0, 92d2, ff00dc) + f0 00145314 ReadBuffer (2d23e8, 3, 0, fef8d360, fef8e020, ff001c) + 18 0005fdec _bt_getbuf (2d23e8, 3, 2, 10000, fef8ecb0, 3) + 28 000601fc _bt_getstackbuf (2d23e8, 3003c8, 2, fef8ccd0, fef8d00c, 1) + 28 0005bc8c _bt_insertonpg (2d23e8, 1, 3003c8, 1, 301688, 300740) + d04 DUMP2 core 'core' of 9640: /home/ttadmin/postgresql/bin/postmaster -B 18000 -D /home/ttadmin/post 001db000 tag_hash (ff4001c8, c, 1daff4, fd4f13d0, fd4f170c, 1) + c 001da134 call_hash (262b50, ff4001c8, 0, 1, 465cf8, 464db0) + 2c 001da1f8 hash_search (262b50, ff4001c8, 0, ff40014b, 465cf8, 464db0) + 64 00144ae8 BufTableLookup (ff4001c8, ff4001c8, 124f6000, 244c00, 927a, 10efd64) + 3c 001458e4 BufferAlloc (428980, 3, ff40026f, 0, 9279, 10efd7c) + 94 00145420 ReadBufferWithBufferLock (428980, 3, 0, 0, 9278, 10efd94) + f0 00145314 ReadBuffer (428980, 3, 0, fd4f1a60, fd4f2720, 10efdac) + 18 0005fdec _bt_getbuf (428980, 3, 2, 10000, fd4f33b0, 3) + 28 000601fc _bt_getstackbuf (428980, 464a38, 2, fd4f13d0, fd4f170c, 1) + 28 DUMP3 core 'core' of 10466: /home/ttadmin/postgresql/bin/postmaster -B 18000 -D /home/ttadmin/post 001db000 tag_hash (ff4001c8, c, 1daff4, fd6093d0, fd60970c, 1) + c 001da134 call_hash (262b50, ff4001c8, 0, 1, 463be0, 462c98) + 2c 001da1f8 hash_search (262b50, ff4001c8, 0, ff40014b, 463be0, 462c98) + 64 00144ae8 BufTableLookup (ff4001c8, ff4001c8, 124f6000, 244c00, 927a, 10ef044) + 3c 001458e4 BufferAlloc (428848, 3, ff40026f, 0, 9279, 10ef05c) + 94 00145420 ReadBufferWithBufferLock (428848, 3, 0, 0, 9278, 10ef074) + f0 00145314 ReadBuffer (428848, 3, 0, fd609a60, fd60a720, 10ef08c) + 18 0005fdec _bt_getbuf (428848, 3, 2, 10000, fd60b3b0, 3) + 28 000601fc _bt_getstackbuf (428848, 462920, 2, fd6093d0, fd60970c, 1) + 28 DUMP4 core 'core' of 12400: /home/ttadmin/postgresql/bin/postmaster -B 18000 -D /home/ttadmin/post 001db000 tag_hash (ff4001f0, c, 1daff4, 19c7, ce, 1) + c 001da134 call_hash (262b50, ff4001f0, 0, 22, ffffffff, 455bd0) + 2c 001da1f8 hash_search (262b50, ff4001f0, 0, ff400173, f6000000, 0) + 64 00144ae8 BufTableLookup (ff4001f0, ff4001f0, 0, 7, 246000, 1146554) + 3c 001458e4 BufferAlloc (4273d8, ffffffff, ff400297, 0, 0, 0) + 94 00145420 ReadBufferWithBufferLock (4273d8, ffffffff, 0, 0, 0, 0) + f0 00145314 ReadBuffer (4273d8, ffffffff, 6, 260000, 0, 0) + 18 0005fe2c _bt_getbuf (4273d8, ffffffff, 2, 455f18, f9583a60, ffbaa0e8) + 68 0005cb8c _bt_split (4273d8, 1, 456c48, 19c7, ce, 1) + 28 0005bd38 _bt_insertonpg (4273d8, 19c7, 455988, 1, 456c48, 455bd0) + db0 Rgds/ Jose Arteaga -----Original Message----- From: ARTEAGA Jose Sent: Thursday, June 07, 2007 12:32 PM To: Alvaro Herrera; Richard Huxton Cc: pgsql-general@xxxxxxxxxxxxxx Subject: RE: [GENERAL] Limitations on 7.0.3? I've looked at the pg_index table and we are currently at 15Mill entries, which should be OK. After 2-3 days runtime I just get a disconnect error from backend while doing an insert. After I restore the DB and insert the same entries it runs fine. Following is the error I get: "Query pgsql8.1: PGRES_FATAL_ERROR, pqReadData() -- backend closed the channel unexpectedly.\n\tThis probably means the backend terminated abnormally\n\tbefore or while processing the request.\n INSERT INTO teststeprun (description, stepunqid, stepid, stepstarttime, duration, parentsn, stepresult) VALUES ('Qos Reset', '24920757/3267', ' 95. 4', '2007-06-02 19:02:05', '0 0:0:25', '5311955', '0')" Also worth mentioning is that I just this week found out about a very, very important parameter "shared buffers". Ever since the original person setup our PG (individual no longer with us) this DB had been running without any major glitches, albeit slow. All this time the shared buffers were running at default of "64" (8192 block size). Once I have got this back up and running I have since set this to 1600 shared buffers (~125MB). I've since noticed a dramatic performance improvement, I hope that I've striked gold. But cannot claim victory yet it's only been up for 2 days. -Jose -----Original Message----- From: Alvaro Herrera [mailto:alvherre@xxxxxxxxxxxxxxxxx] Sent: Thursday, June 07, 2007 7:02 AM To: Richard Huxton Cc: ARTEAGA Jose; pgsql-general@xxxxxxxxxxxxxx Subject: Re: [GENERAL] Limitations on 7.0.3? Richard Huxton wrote: > ARTEAGA Jose wrote: > >I have been using postgres 7.0.3 (Solaris 2.9) for the past 4 years > >without any major problems, until about a month ago. We are now > >experiencing crashes on the backend (connection lost to backend) while > >running queries (inserts, etc). Anyway I don't want to make this too > >technical but I'd just like to understand if maybe I'm running into some > >sort of limit on the size of the database. My biggest table is currently > >at 1.5B tuples. > > > >Would appreciate if anyone could let me know or is aware of any limits > >with 7.0 version. > > I don't remember any specific size limitations on the 7.0 series. For > more detailed help you'll have to provide some specific error messages. 7.0 didn't have any protection against Xid wraparound. As soon as you hit the 4 billion transactions mark, your data suddenly disappeared. That's what I heard at least -- I didn't have much experience with such an old beast. We switched rather quickly to 7.1. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "Before you were born your parents weren't as boring as they are now. They got that way paying your bills, cleaning up your room and listening to you tell them how idealistic you are." -- Charles J. Sykes' advice to teenagers