Hi Tom,
It was a backend crash on my production 7.4 instance, running on Suse
Linux 9.1 Pro, installed from Suse's precompiled binaries, on what turns
out to be questionable hardware.
I wrote a PL/PERL function specifically to crash the database in this
way, and ran it against a test instance on Suse Linux 9.2 Pro running
under Xen, on 8.0.2 from the precompiled binaries supplied by Suse.
Here's the test function:
CREATE OR REPLACE FUNCTION internal.test_crash()
RETURNS SETOF "varchar" AS
$BODY$# This function is intended to crash the test server. DO NOT USE
in production
my $current_output;
my $retval = [];
my $data_handle = spi_exec_query('SELECT * FROM
schema1."table_of_approximately 30000 rows";');
my $nrows = $data_handle->{processed};
foreach my $tt (0 .. 10000) {
foreach my $rn (0 .. $nrows - 1) { # Loop over rows of data
my $row = $data_handle->{rows}[$rn]; # Fetch a row from the view
$current_output->{"storageRowNumber"} = $row ->{"storageRowNumber"};
my %output_copy = %{$current_output};
push @$retval, \%output_copy;
}
}
return $retval;$BODY$
LANGUAGE 'plperl' VOLATILE;
ALTER FUNCTION internal.test_crash() OWNER TO testdbowner;
Before running the above function I logged into from my workstation
using pgAdmin (the on that ships w/ v8.1) and ran a simple SELECT query,
which worked fine. I then ran the function from inside a PSQL shell on
the server itself. Watching top, the postmaster process associated with
that query just grew until it exhausted the available memory (a couple
hundred megabytes) and was killed:
Dec 5 14:54:33 dbtestserver kernel: Out of Memory: Killed process 3294
(postmaster).
The rest of the /var/log/messages output is below.
Here's the process tree after the process was killed:
2379 ? S 0:00 /usr/bin/postmaster -D /var/lib/pgsql/data
3298 ? S 0:00 \_ postgres: writer process
3299 ? S 0:00 \_ postgres: stats buffer process
3300 ? S 0:00 \_ postgres: stats collector process
In pgAdmin I then attempted to rerun my previous simple SELECT query and
got the error message:
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Which makes me believe that all clients would be disconnected (which did
occur when I accidentally ran the large query on my production server).
Now when I run a query with a large result set from pgAdmin3:
SELECT * FROM internal.audit_log INNER JOIN audit_tables.cases
ON cases.audit_log_id = audit_log."auditID", parts
ORDER BY audit_log.ts DESC
LIMIT 200;
(audit_log has about 500K rows, cases about 1000 and parts about 10000),
top shows memory uses jumping around at about 297MB, and PSQL from an
SSH connection to the server remains responsive. It's quite possible
that that other large query crash was due to bad memory/MOBO on the
other server. Before this query completes, pgAdmin3 returns:
ERROR: could not write block 122159 of temporary file: No space left
on device
HINT: Perhaps out of disk space?
Which seems correct, as the query is exhausting the available disk
space. So I guess the problem is limited to PL/PERL functions, and the
other case occurred due to bad hardware. Thanks for pointing me in the
right direction.
Cheers,
Eric
Tom Lane wrote:
Eric E <whalesuit@xxxxxxxxx> writes:
I've recently had a couple worrisome experiences with runaway
queries in postgres 7.4 and 8.0, and I'm wondering what I can do to
control or prevent these. When I ran a query against a large table, I
accidentally omitted mistyped a table name, and a table was
automatically added, resulting in a massive result set that crash my
postmaster instance.
If it crashed your backend (as opposed to your client), we'd like to
see more details.
Is there some way to limit the amount of
memory a single query process can use?
See ulimit.
regards, tom lane