Re: Async IO HTTP server frontend for PostgreSQL

Allan Kamau <kamauallan@xxxxxxxxx> · Wed, 10 Sep 2014 23:29:47 +0300

Dear Dmitriy,
To add on to David's suggestions, Data caching is a difficult task to
undertake. Consider an example where your data may not all fit into
memory, when you cache these data outside PostgreSQL you would need to
look into memory management as well as issues around concurrent
population of this cache as well as means to keep the data in the
cache fresh in tune with any changes to the data. These are no trivial tasks and the database community
has spent years constructing and improving algorithms to do this on
behalf of the front end database application developer. Also each
time a TCP connection is created, additional compute resources are
consumed by the OS as well as the database management server
software.

A simpler way would be to use
connection pooling where a thread of your running application "borrows" a connection from a pool of open connections, executes the SQL command then returns the connection immediately on completion of the SQL command. This will require few concurrent connections (depending of configuration) and let
the database do the caching of the data for you. For effective database data caching may need to make adjustments of the PostgreSQL configuration file (postgresql.conf
file) as well as the operating system resources configuration. This
way the response time of your client application will degrade
gracefully with the increase of concurrent client requests.
For small number of concurrent
connections, the speed advantage direct “streaming” solution may
have over the traditional connection pooling solution may hardly be
noticeable to end user. The easier way to increase response
time is to look into PostgreSQL performance tuning as well as
investing in faster hardware (mainly the the disk subsystem and more
RAM).

Regards,
Allan.

On Wed, Sep 10, 2014 at 8:25 PM, Dmitriy Igrishin <dmitigr@xxxxxxxxx> wrote:
Hello, Steve

2014-09-10 21:08 GMT+04:00 Steve Atkins <steve@xxxxxxxxxxx>:

On Sep 10, 2014, at 12:16 AM, Dmitriy Igrishin <dmitigr@xxxxxxxxx> wrote:

> Hello, David

>

> 2014-09-10 4:31 GMT+04:00 David Boreham <david_list@xxxxxxxxxxx>:

> Hi Dmitriy, are you able to say a little about what's driving your quest for async http-to-pg ?

> I'm curious as to the motivations, and whether they match up with some of my own reasons for wanting to use low-thread-count solutions.

> For many web projects I consider Postgres as a development platform. Thus,

> I prefer to keep the business logic (data integrity trigger functions and

> API functions) in the database. Because of nature of the Web, many concurrent

> clients can request a site and I want to serve maximum possible of them with

> minimal overhead. Also I want to avoid a complex solutions. So, I believe that

> with asynchronous solution it's possible to *stream* the data from the database

> to the maximum number of clients (which possible can request my site over a

> slow connection).

That's going to require you to have one database connection open for each

client. If the client is over a slow connection it'll keep the database connection

open far longer than is needed, (compared to the usual "pull data from the

database as fast as the disks will go, then spoonfeed it out to the slow client"

approach). Requiring a live database backend for every open client connection

doesn't seem like a good idea if you're supporting many slow concurrent clients.
Good point. Thus, some of caching on the HTTP server side should be implemented
then.

-- 
// Dmitriy.