Re: Pet Peeves?

Chris Mayfield <cmayfiel@xxxxxxxxxxxxx> · Tue, 03 Feb 2009 15:03:43 -0500

Here's a few more pet peeves.  I'm not sure if any of these are known 
bugs or just me being picky.

--Chris

--------------------------------------------------

1. Having to rewrite entire tables out to disk the first time I scan 
them, for example:

CREATE TABLE t1 AS ...; -- writes 100 GB to disk
CREATE INDEX i1 ON t1 ...; -- rewrites 100 GB to disk

The main issue is setting the hint bits for each tuple, which IMO should 
initially be set for "CREATE TABLE AS" statements.  To work around this 
for now, I modified heap_insert (in heapam.c) to mark tuples as 
committed when inserting them into newly added pages without WAL:

/*
 * Optimization for CREATE TABLE AS statements: mark tuples as committed
 * to prevent rewriting them to disk upon first use. This is safe since
 * the new relation is not visible until the transaction commits anyway.
 */
if (!use_wal && !use_fsm)
{
	tup->t_data->t_infomask |= HEAP_XMIN_COMMITTED;
}

--------------------------------------------------

2. Having to use a subquery and/or "OFFSET 0" to prevent multiple calls 
to immutable functions returning composite types, for example:

CREATE TYPE three AS
  (i integer, j integer, k integer);

CREATE FUNCTION hello()
RETURNS three AS $$
DECLARE ret three;
BEGIN
  RAISE NOTICE 'hello';
  ret := ROW(1,2,3);
  RETURN ret;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

-- function called three times
SELECT (hello()).*;

-- function called only once
SELECT (h).* FROM (
  SELECT hello() AS h
) AS sub;

-- function called three times
SELECT (h).* FROM (
  SELECT hello() AS h
  FROM generate_series(1,3)
) AS sub;

-- function called only once
SELECT (h).* FROM (
  SELECT hello() AS h
  FROM generate_series(1,3)
  OFFSET 0
) AS sub;

--------------------------------------------------

3. Poor estimates for n_distinct can result in out of memory errors.

For example, I have a user-defined aggregate over a user-defined data 
type (both written in C).  The aggregate doesn't take much memory, but 
the data type can be relatively large (i.e. "STORAGE = extended").  My 
table has five million distinct rows, but n_distinct is around 50,000 
(increasing the stats target helps a little, but it's still way off). 
As a result the planner choses "SeqScan + HashAgg" instead of "IndexScan 
+ GroupAgg", and the query aborts when the hash table eventually runs 
out of memory.

I currently work around this issue using "SET enable_hashagg TO off;" 
when necessary.

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general