It figures I'd have an idea right after posting to the mailing list.
I got the id of the last row the copy to command was able to grab normally and tried to figure out the next id. The following started to make me think along the lines of some kinda bad corruption (even before getting responses that agreed with that):
Assuming that the last id copied was 1500:
1) select * from foo where id = (select min(id) from foo where id > 1500);
Results in 0 rows
2) select min(id) from foo where id > 1500;
Results in, for example, 200000
3) select max(id) from foo where id > 1500;
Results in, for example, 90000 (a much lower number than returned by min)
4) select id from foo where id > 1500 order by id asc limit 10;
Results in (for example):
200000
202000
210273
220980
15005
15102
15104
15110
15111
15113
So ... yes, it seems that those four id's are somehow part of the problem.
They're on amazon EC2 boxes (yeah, we're not too fond of the EC2 boxes either), so memtest isn't available, but no new corruption has cropped up since they stopped killing the waiting queries (I just double checked - they were getting corrupted rows constantly, and we haven't gotten one since that script stopped killing queries).
We're going to have them attempt to delete the rows with those id's (even though the rows don't exist) and if that fails, we're going to copy (select * from foo where id not in (<list>)) to file;, drop table foo;, create table foo;, and copy foo from file. I'll try to remember to write back with whether or not any of those things worked.
On Wed, Sep 8, 2010 at 1:30 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Sam Nelson <samn@xxxxxxxxxxxxxxxxxxx> writes:What that looks like is data corruption; specifically, a bogus length
> pg_dump: Error message from server: ERROR: invalid memory alloc request
> size 18446744073709551613
> pg_dump: The command was: COPY public.foo (<columns>) TO stdout;
> That seems like an incredibly large memory allocation request - it shouldn't
> be possible for the table to really be that large, should it? Any idea what
> may be wrong if it's actually trying to allocate that much memory for a copy
> command?
word for a variable-length field.
regards, tom lane