Re: postgres wish list

"Sameer Mahajan" <Sameer_Mahajan@xxxxxxxxxxxx> · Thu, 19 Feb 2009 14:59:48 +0530

Thanks Craig.

Comments inline.

> -----Original Message-----
> From: Craig Ringer [mailto:craig@xxxxxxxxxxxxxxxxxxxxx]
> Sent: Thursday, February 19, 2009 12:56 PM
> To: Sameer Mahajan
> Cc: pgsql-general@xxxxxxxxxxxxxx
> Subject: Re:  postgres wish list
> 
> Sameer Mahajan wrote:
> > I recently worked on rewriting a fairly complex Berkeley DB based
> system
> > using Postgres DB. Following  is a wish list and a brief explanation
> for
> > each of those items for postgres. It is not necessarily a comparison
> of
> > postgres functionality with that of Berkeley DB but some required
> useful
> > functionality currently lacking in postgres. I would like to have
> some
> > feedback, comments, suggestions regarding these.
> >
> >
> >
> > *         Shared memory based connectivity: As such postgres has
> client
> > - server model. The TCP-IP nature of its connectivity further adds
to
> > the latency of this communication. It will be nice to have a shared
> > memory based connectivity between libpq front end and the back end.
> 
> Use UNIX domain sockets. You eliminate the need for TCP/IP completely,
> and get rather lower latency and faster communication between client
> and
> server. It's not shared memory, but it's really rather close in
> performance terms.
> 
> Unlike with Berkeley DB, PostgreSQL's server runs at a different
> privilege level to its clients. Using shared memory for client/server
> communication would be more complex when you consider the security
> issues involved. Additionally, my understanding is that the backend is
> used to being able to arrange its memory in the most efficient way
> without worrying about what the client expects to be where and when,
> concerning its self over the accessibility of data the client doesn't
> have permission to see, etc.
> 
[Sameer Mahajan] I will investigate how the unix domain sockets help in
my case. Why isn't it the default for postgres installations? Or it
isn't believed to be generic enough / straight forward enough to
configure?
> > o   Nested transaction funtionality: I followed quite a few
> discussions
> > stating postgres considering it at some time but later abandoning it
> due
> > to some complexity. The "savepoints" are also believed to provide
> > similar semantics. However it is useful and handy to have the simple
> > nested transaction syntax and functionality.
> 
> I guess I can see the appeal at the application programming level if
> porting an app that's used to nested transactions. Personally I'm very
> happy with savepoints, though - what about them doesn't satisfy your
> particular needs?
> 
> I'd expect to be able to translate a BEGIN within an existing
> transaction to a `SAVEPOINT <generated-name>', a COMMIT to `RELEASE
> SAVEPOINT <generated-name>' and a ROLLBACK to `ROLLBACK TO SAVEPOINT
> <generated-name>'. Wouldn't that achieve the desired effect? Or is the
> problem that you don't want your application code to have to know
> whether it's being called from within an existing transaction or not?
> 
> If the latter is the issue, you can provide a fairly simple
> context-aware wrapper that does the right thing.
> 
> trans_begin(connection_object)
>    -> examines transaction state of connection object
>       and if no active transaction issues BEGIN, otherwise
>       generates and stores a savepoint name.
> 
> etc.
> 
> You'd just use a simple wrapper structure around the libpq connection
> object to track your state.
> 
[Sameer Mahajan] all this is a little bit of baggage to carry around in
the application...
> > *         Cursors
> >
> > o   It would be useful to have updateable cursors. Otherwise the
> > application is required to figure out a required INSERT versus
UPDATE
> > amongst other things.
> 
> Pg does have updatable cursors.
> 
> http://www.postgresql.org/docs/current/static/sql-declare.html
> 
> See "WHERE CURRENT OF" in:
> 
> http://www.postgresql.org/docs/current/static/sql-update.html
> http://www.postgresql.org/docs/current/static/sql-delete.html
> 
> Also, for Pl/PgSQL:
> 
> http://www.postgresql.org/docs/current/static/plpgsql-cursors.html
> 
[Sameer Mahajan] hmm.... I guess I have to stick to an earlier version
for some other reasons and it doesn't have it...
> > o   Berkeley DB has various positioning (e.g. DB_SET_RANGE ) as well
> as
> > operational (e.g. DB_NOOVERWRITE)  options in its cursors. Postgres
> can
> > provide such functionality (e.g. using underlying btree for an index
> to
> > give DB_SET_RANGE like positioning) very easily.
> 
> Maybe you could explain what the options you find useful are, and
> provide examples as to why existing PostgreSQL functionality doesn't
do
> what you need? It might help people understand what you want and why.
> 
[Sameer Mahajan] the most useful one that I found was DB_SET_RANGE which
internally can use the efficient btree indexing. Of course I can perform
a serial scan in my application to achieve the same result but it would
be a little in efficient.
> > *         Configurable error handling control: in case of fatal
> errors
> > the transaction is aborted by default. It would be useful to make
> some
> > configurable options available for the application to control the
> error
> > handling in such situations. Berkeley DB has an option where all the
> > exceptions can be turned off and the system returns only error codes
> > which are then required to be handled by the application.
> 
> I've mostly worked with Pg via psycopg (Python), PL/PgSQL running
> within
> Pg, and with JDBC (often wrapped by Hibernate). In all these cases
> error
> handling has been pretty flexible and easy.
> 
> I assume what you're really getting at is that you want an implicit
> savepoint before each statement that's released after successful
> statement execution, or rolled back to then released if the statement
> fails and before error information is reported to the application?
> 
> If so, while AFAIK there's no built-in facility for this, it's not
> difficult to implement at the application level. Personally I'm
dubious
> about the value of including something like that in Pg its self, as it
> encourages apps to throw queries at the DB and see if they work,
rather
> than issuing queries that're designed to always succeed. For example,
> rather than (application pseudocode):
> 
> sql.execute("SAVEPOINT x; INSERT INTO tablename (key, value) VALUES
(1,
> 9);");
> if (sql.error == unique_violation)
> {
>      sql.execute("ROLLBACK TO SAVEPOINT x; UPDATE tablename SET value
=
> 9 WHERE key = 1;");
> }
> sql.execute("RELEASE SAVEPOINT x;");
> 
> 
> ... it's probably better to do:
> 
> sql.execute("UPDATE tablename SET value = 9 WHERE key = 1; INSERT INTO
> tablename (key, value) SELECT 1, 9 WHERE NOT EXISTS(SELECT 1 FROM
> tablename WHERE key = 1);");
> 
> The above will update the value if it's already present, or failing
> that
> insert a new record with that value. It can be done in a batch with
far
> fewer DB round trips and you avoid the overhead of creating and
> releasing savepoints all the time.
> 
> In this particular case it'd be nice if Pg had a "MERGE" / "UPSERT"
> feature to handle this case but you can probably see my point.
> 
> I've rewritten code that used to use try-and-see (either in client
code
> or Pl/PgSQL) to work in an always-succeed style and found that it
often
> runs literally hundreds of times faster. It's also much more amenable
> to
>   tricks like loading a temp table full of dirty data and running a
> single query over it to do the work you want to, which can be VASTLY
> faster than having the app send it all over piecemeal. Once you're
> dealing with network latency with remote clients this starts to become
> very significant.
> 
[Sameer Mahajan] I guess I would buy this argument since the places I
need it justifies the amount of logic I can / need to add in the
application layer.
> --
> Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general