On Fri, Apr 11, 2008 at 2:54 PM, Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> wrote: > Pavan Deolasee wrote: > > I wonder if it would make sense to add support to mount database in > > *read-only* mode from multiple servers though. I am thinking about > > data warehouse kind of operations where multiple servers can be > > used answer read-only queries. Is there a use case for such applications > > in real world ? Not quite workable. Remember that table data is not always available on the block device -- there are pages modified in the buffer cache (shared memory), and other machines have no access to the other's shared memory (and it would be a lot of work to do it efficiently). Remember also about the MVCC -- if your "read only copy machine" starts a complicated query on some big_table, and in the meanwhile "read-write machine" decides the big_table's pages can be reused... well your "read-only" machine doesn't even have a way of knowing its returning garbage data. ;-) Noow, if you really really want a read-only copy of the read write data available over the network, many NAS/SAN devices will allow you to make a snapshot of the database -- and you can use that snapshot as a read-only copy of the database. But then again, if you want a read-only copy of a days/weeks old database, there are chaper and better ways of doing it. > I'm not sure that makes sense myself. The reason you 'd want multiple read > only instances is for performance and/or redundancy. Neither of those goals > are well served by having a shared data store. > > A shared data store won't help performance much because both instances will > compete for I/O bandwidth. It might be faster if most of the regularly used > data and indexes fit in memory on the host, but even then I'd personally be > surprised if the cost of the network/shared storage didn't counteract that > at least in part. That is assuming your bottleneck is the I/O subsystem. If your data fits nicely in RAM, but you are CPU bound, sometimes it is sensible to have two machines than having one twice as powerful machine. Also its easier to add third machine later, than to buy yet more powerful one. But this if a field where YMMV. A known implementation of such a set up would be Oracle RAC, where you have a shared storage and N machines using it. > For redundancy, you ideally want to avoid shared infrastructure that can > fail - like shared storage. It's likely to be better to keep separate copies > of the data store on each host. > > There are systems - like Slony-I and log shipping replication - that can > keep servers in sync without shared storage, and are almost certainly more > useful than shared-storage DB servers. > > What I do think would be very interesting would be the ability to have a DB > serving read-only queries while still reading in shipped WAL archives as > they arrive. That'd be a potential big win on performance because each DB > server could have its own I/O system, CPUs and RAM . With some trickyness > you could even forward queries that did require writes to the master server > transparently, while servicing read only queries locally. Something like pgpool (which can forward read-write queries to a master, and handle selects on a pool of read-only machines). While I think pgpool, pgbouncer and Slony-I are great pieces of software, I would like to wake up one day and know that PostgreSQL can do it all internally, under the hood, just like it does WAL-logging and startup recovery automatically. ;-) Regards, Dawid