Re: Postgres on shared network drive

"Dawid Kuroczko" <qnex42@xxxxxxxxx> · Sat, 12 Apr 2008 19:30:16 +0200

On Fri, Apr 11, 2008 at 2:54 PM, Craig Ringer
<craig@xxxxxxxxxxxxxxxxxxxxx> wrote:
> Pavan Deolasee wrote:
> > I wonder if it would make sense to add support to mount database in
> > *read-only* mode from multiple servers though. I am thinking about
> > data warehouse kind of operations where multiple servers can be
> > used answer read-only queries. Is there a use case for such applications
> > in real world ?

Not quite workable.  Remember that table data is not always available on
the block device -- there are pages modified in the buffer cache (shared
memory), and other machines have no access to the other's shared memory
(and it would be a lot of work to do it efficiently).  Remember also about the
MVCC -- if your "read only copy machine" starts a complicated query on
some big_table, and in the meanwhile "read-write machine" decides the
big_table's pages can be reused... well your "read-only" machine doesn't
even have a way of knowing its returning garbage data. ;-)

Noow, if you really really want a read-only copy of the read write data
available over the network, many NAS/SAN devices will allow you to
make a snapshot of the database -- and you can use that snapshot as
a read-only copy of the database.  But then again, if you want a read-only
copy of a days/weeks old database, there are chaper and better ways of
doing it.

>  I'm not sure that makes sense myself. The reason you 'd want multiple read
> only instances is for performance and/or redundancy. Neither of those goals
> are well served by having a shared data store.
>
>  A shared data store won't help performance much because both instances will
> compete for I/O bandwidth. It might be faster if most of the regularly used
> data and indexes fit in memory on the host, but even then I'd personally be
> surprised if the cost of the network/shared storage didn't counteract that
> at least in part.

That is assuming your bottleneck is the I/O subsystem.  If your data fits nicely
in RAM, but you are CPU bound, sometimes it is sensible to have two
machines than having one twice as powerful machine.  Also its easier to
add third machine later, than to buy yet more powerful one.
But this if a field where YMMV.

A known implementation of such a set up would be Oracle RAC, where
you have a shared storage and N machines using it.

>  For redundancy, you ideally want to avoid shared infrastructure that can
> fail - like shared storage. It's likely to be better to keep separate copies
> of the data store on each host.
>
>  There are systems - like Slony-I and log shipping replication - that can
> keep servers in sync without shared storage, and are almost certainly more
> useful than shared-storage DB servers.
>
>  What I do think would be very interesting would be the ability to have a DB
> serving read-only queries while still reading in shipped WAL archives as
> they arrive. That'd be a potential big win on performance because each DB
> server could have its own I/O system, CPUs and RAM . With some trickyness
> you could even forward queries that did require writes to the master server
> transparently, while servicing read only queries locally.

Something like pgpool (which can forward read-write queries to a master, and
handle selects on a pool of read-only machines).

While I think pgpool, pgbouncer and Slony-I are great pieces of
software, I would
like to wake up one day and know that PostgreSQL can do it all internally, under
the hood, just like it does WAL-logging and startup recovery automatically. ;-)

   Regards,
       Dawid