Re: Bug? Query plans / EXPLAIN using gigabytes of memory

Willy-Bas Loos <willybas@xxxxxxxxx> · Wed, 25 Apr 2012 11:16:50 +0200

would it be possible to reproduce the same query without using any views?
you could see the difference in memory usage.

if that doesn't explain, try also without inheritance, by using the ONLY keyword (and UNION ALL).

If it's really only a couple of rows, you might as well post a dump somewhere? Then i could reproduce.

WBL

On Wed, Apr 25, 2012 at 10:51 AM, Toby Corkindale <toby.corkindale@xxxxxxxxxxxxxxxxxxxx> wrote:

Hi Willy-Bas,

Thanks for your reply.

I realise that stacking the views up like this complicates matters, but the actual views are fairly simple queries, and each one individually is only looking at a few dozen rows. (Eg. selecting min, max or average value from a small set, grouped by one column)

>From the point of view of creating reporting queries, it's a nice and logical way to build up a query, and we didn't think it would present any problems.. and even on a well-populated database, the query runs very fast. It's just the astounding amount of memory used that presents difficulties.

Looking at the postgresql.conf for non-default settings, the notable ones are:

max_connections = 200

ssl = false

shared_buffers = 256MB

max_prepared_transactions = 16

# although they aren't used for the group of queries in question

maintenance_work_mem = 128MB

# work_mem is left at default of 1MB

effective_io_concurrency = 2

random_page_cost = 3.0

effective_cache_size = 512MB

geqo = on

geqo_threshold = 12

geqo_effort = 7

Some other things are non-default, like checkpoints, streaming-replication stuff, but those shouldn't have any effect.

The memory settings (shared buffers, effective cache) might seem to be set quite conservatively at the moment, given the memory available in the machine -- but since we can exhaust that memory with just a few connections, it seems fair.

Cheers,

Toby

----- Original Message -----

From: "Willy-Bas Loos" <willybas@xxxxxxxxx>

To: "Toby Corkindale" <toby.corkindale@xxxxxxxxxxxxxxxxxxxx>

Cc: "pgsql-general" <pgsql-general@xxxxxxxxxxxxxx>

Sent: Wednesday, 25 April, 2012 6:05:37 PM

Subject: Re:  Bug? Query plans / EXPLAIN using gigabytes of memory

Stacking views is a bad practice. It usually means that you are making the db do a lot of unnecessary work, scanning tables more than once when you don't even need them.

According to your description, you have 3 layers of views on partitioned tables.

I can imagine that that leaves the planner with a lot of possible query plans, a lot of interaction and a lot of statistics to read.

do you have any special settings for the statistics on these tables?

and could you please post the non-default settings in your postgresql.conf file?

$ grep ^[^#] /etc/ postgresql /9.1/main/ postgresql . conf | grep -e ^[^[:space:]]

Would be helpful to see if you have any statistics or planner stuff altered.

Cheers,

WBL

On Wed, Apr 25, 2012 at 8:18 AM, Toby Corkindale < toby.corkindale@xxxxxxxxxxxxxxxxxxxx > wrote:

Hi,

I'm hitting some peculiar behaviour. I'm currently on Pg 9.1.3 on a 64bit Debian system.

I have a database which is moderately large - 20 GByte or so - and contains that data split up over dozens of tables, which are themselves partitioned.

Queries are usually only run against fairly small, partitioned, sets of data.

These queries generally run fairly fast. Performance is not a problem.

However Postgres is chewing up huge amounts of memory just to create the query plan!

For example, even if I just run

EXPLAIN SELECT a_column FROM a_view

WHERE partition_id = 1;

Then the postgres backend process takes several seconds to return, and in the worst example here, is hogging more than 3Gbyte once it comes back. (It doesn't free that up until you close the connection)

The query plan that comes back does seem quite convoluted, but then, the view is a query run over about eight other views, each of which is pulling data from a few other views. The actual underlying data being touched is only *a few dozen* small rows.

As I said, the query runs fast enough.. however we only need a handful of these queries to get run in separate connections, and the database server will be exhausted of memory. Especially since the memory isn't returned until the end of the connection, yet these connections typically stay up for a while.

I wondered if there's anything I can do to reduce this memory usage? And, is this a bug?

I've posted the output of the query plan here: https://gist.github.com/2487097

Thanks in advance,

Toby

--

Sent via pgsql-general mailing list ( pgsql-general@xxxxxxxxxxxxxx )

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-general

--

"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth

-- 
"Quality comes from focus and clarity of purpose" -- Mark Shuttleworth