Re: temp tables, sessions, pgpool and disk

Ivan Sergio Borgonovo <mail@xxxxxxxxxxxxxxx> · Mon, 22 Jun 2009 09:53:59 +0200

On Mon, 22 Jun 2009 07:26:56 +0800
Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> wrote:

> > http://www.postgresql.org/docs/8.3/interactive/sql-createtable.html
> > "If specified, the table is created as a temporary table.
> > Temporary tables are automatically dropped at the end of a
> > session"
> > 
> > I'd interpret it as a connection.
> 
> Correctly.
> 
> > I don't even know if it is possible to send more than one command
> > over a single connection and wait for the results asynchronously.
> > Any clarification?
> 
> To an extent cursors provide that ability. The "result" is returned
> quite promptly, but it's a placeholder that allows you to retrieve
> the real results progressively as you need them. Whether the
> database generates the results immediately and stores them to
> return later, or whether it generates them on demand, isn't
> something you can easily tell it's up to the database.

http://www.postgresql.org/docs/8.3/interactive/libpq-async.html
"PQsendQuery cannot be called again (on the same connection) until
PQgetResult has returned a null pointer, indicating that the command
is done."

Asynchronous calls can't be made to parallelize postgres queries on
the same "session", but just to parallelize client and server work.

So a temp table seems as private as I need it.

I wonder what will happen if I put something like pgpool between
postgresql and a web app.

> > So postgresql actually issues writes to disk and delegate to the
> > OS management of the cache/actual write on disk.

> Yes.

> > I thought it could just try to hold them in RAM and still
> > delegate to the OS to save them on disk in swap if the system is
> > short on RAM.

> For a variety of reasons, you REALLY don't want it to work that
> way.

mmm... first sorry for the noise...
Interpret the following as reality checks.
I'm perfectly aware building up a DB is not easy, and I'm not
pretending I know how to write one. ;)

> OS memory managers tend to be _much_ better and faster at managing
> pages that're backed by a file. They'll write dirty data out
> pre-emptively so that execution doesn't stall when memory runs
> low; they write data to the file in order for best disk
> performance; they efficiently buffer and read-ahead when pulling
> the data back in, etc.

> The OS knows much less about what anonymous memory (memory not
> backed by a file) "means" to a program and can't be as clever with
> it. Swapping tends to be _much_ more CPU expensive than writing

But issuing a write to disk Postgresql doesn't actually say anything
more about what it is placing on the disk and how it is going to
access it... and it is actually adding overhead to move it back and
forward, no matter if this overhead happens on RAM or disk.
Actually since temp table are private to the connection they should
(?) be private to a postgresql process, so the OS should be able to
do a good job.
I don't see any atomicity constraint, so... if something fail while
writing to RAM, as you said you shouldn't need a WAL.

> dirty buffers to a file. It's a lot more expensive to retrieve
> from disk, too, and usually involves lots of seeks for quite
> scattered pages instead of nice block readahead.

Once you're running out of memory I see no guaranty your file will
end up in a fast easily accessible area of your disk... and you're
going to add the overhead associated with a file system
(journalling, permissions/ownership, locks) swap is volatile... and
it should offers the guaranty you "need" for a temp table.

> The OS knows much better than PostgreSQL does when the table will
> fit in RAM and when it needs to spill to disk, and it's much
> better at managing that than Pg can ever be. It's great that Pg
> just uses the OS's hardware knowledge, system-wide awareness, and
> highly optimised memory manager + disk IO management to take care
> of the problem.

The same should be true for virtual memory, not just file management
and postgresql has a configuration file that should give a clue to
the DB about the expected workload and hardware.
Surely postgresql can't forecast how many and how large the temp
tables for a single connection will be... but substantially I got
the idea that a connection is somehow serial in its execution and
that storage could be garbage collected or just released early
(drop table, on commit drop).
This looks as it is taking temp tables very far from the standard.
And yeah... once you want to do memory management/resource management
inside SQL you've opened the doors of Hell.
But well 

For what I could see about SQL99 the definition of temp table is very
terse... and a bit confusing (at least for me) about global and
local.
I gave a quick look at what's available on MS SQL... and they have
an sort of "in memory temp table" but you can't modify its schema.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general