On 10/30/2017 10:56 AM, Peter Geoghegan
wrote:
My test database machine is:On Mon, Oct 30, 2017 at 9:45 AM, Rob Sargent <robjsargent@xxxxxxxxx> wrote:Peter, you beat me to the punch. I was just about to say "Having read the referenced message I thought I would add that we never delete from this table." In this particular case it was written to record by record, in a previous execution and at the time of the error it was only being read. (In case you've been following, the failed execution would have added ~1M "segments", each which references an entry in the gin'd table "probandsets" - but like a rookie I'm looking up each probandset(2^16) individually. Re-working that NOW.)It's not surprising that only a SELECT statement could see this problem. I guess that it's possible that only page deletions used for the pending list are involved here. I'm not sure how reliably you can recreate the problem, but if it doesn't take too long then it would be worth seeing what effect turning off the FASTUPDATE storage parameter for the GIN index has. That could prevent the problem from recurring, and would support my theory about what's up here. (It wouldn't fix the corruption, though.) Of course, what I'd much prefer is a self-contained test case. But if you can't manage that, or if reproducing the issue takes hours, then this simpler experiment might be worthwhile. Not virtualI've loaded thrice the number of records (190K) into the problem table, but no sign yet of the problem. But unlike the production lookup-notfind-insert (anti)pattern, these were all loaded in a single transaction. I think the following query has to read the gin'd column of every record: select array_length(probands,1) as heads,Happy as a clam. I'll try a run of the antipattern. I have NOT diddled FASTUPDATE at all. |