Search Postgresql Archives

Re: Pet Peeves?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here's a few more pet peeves. I'm not sure if any of these are known bugs or just me being picky.

--Chris

--------------------------------------------------

1. Having to rewrite entire tables out to disk the first time I scan them, for example:

CREATE TABLE t1 AS ...; -- writes 100 GB to disk
CREATE INDEX i1 ON t1 ...; -- rewrites 100 GB to disk

The main issue is setting the hint bits for each tuple, which IMO should initially be set for "CREATE TABLE AS" statements. To work around this for now, I modified heap_insert (in heapam.c) to mark tuples as committed when inserting them into newly added pages without WAL:

/*
 * Optimization for CREATE TABLE AS statements: mark tuples as committed
 * to prevent rewriting them to disk upon first use. This is safe since
 * the new relation is not visible until the transaction commits anyway.
 */
if (!use_wal && !use_fsm)
{
	tup->t_data->t_infomask |= HEAP_XMIN_COMMITTED;
}

--------------------------------------------------

2. Having to use a subquery and/or "OFFSET 0" to prevent multiple calls to immutable functions returning composite types, for example:

CREATE TYPE three AS
  (i integer, j integer, k integer);

CREATE FUNCTION hello()
RETURNS three AS $$
DECLARE ret three;
BEGIN
  RAISE NOTICE 'hello';
  ret := ROW(1,2,3);
  RETURN ret;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

-- function called three times
SELECT (hello()).*;

-- function called only once
SELECT (h).* FROM (
  SELECT hello() AS h
) AS sub;

-- function called three times
SELECT (h).* FROM (
  SELECT hello() AS h
  FROM generate_series(1,3)
) AS sub;

-- function called only once
SELECT (h).* FROM (
  SELECT hello() AS h
  FROM generate_series(1,3)
  OFFSET 0
) AS sub;

--------------------------------------------------

3. Poor estimates for n_distinct can result in out of memory errors.

For example, I have a user-defined aggregate over a user-defined data type (both written in C). The aggregate doesn't take much memory, but the data type can be relatively large (i.e. "STORAGE = extended"). My table has five million distinct rows, but n_distinct is around 50,000 (increasing the stats target helps a little, but it's still way off). As a result the planner choses "SeqScan + HashAgg" instead of "IndexScan + GroupAgg", and the query aborts when the hash table eventually runs out of memory.

I currently work around this issue using "SET enable_hashagg TO off;" when necessary.

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux