Re: Databases vs Schemas

Scott Carey <scott@xxxxxxxxxxxxxxxxx> · Fri, 9 Oct 2009 19:50:42 -0700

On 10/9/09 2:02 PM, "Merlin Moncure" <mmoncure@xxxxxxxxx> wrote:

> On Fri, Oct 9, 2009 at 1:46 PM, Scott Otis <scott.otis@xxxxxxxxxx> wrote:
>> Over the next couple of months we will be creating an instance of our
>> solution for each public school district in the US which is around 18,000. 
>> That means currently we would be creating 18,000 databases (all on one server
>> right now  which is running 8.4).  I am assuming this is probably not the
>> best way of doing things.
> 
> Schema advantages:
> *) maintenance advantages; all functions/trigger functions can be
> shared.  HUGE help if you use them
> *) can query shared data between schemas without major headaches
> *) a bit more efficiency especially if private data areas are small.
> kinda analogous to processes vs threads
> *) Can manage the complete system without changing database sessions.
> This is the kicker IMO.
> 
> Database Advantages:
> *) More discrete.  Easier to distinctly create, dump, drop, or move to
> separate server
> *) Smaller system catalogs might give efficiency benefits
> 

I'm concerned how a system with 57 * 18000 > 1M tables will function.

I've got 200,000 tables in one db (8.4), and some tools barely work.  The
system catalogs get inefficient when large and psql especially has trouble.
Tab completion takes forever, even if I make a schema "s" with one table in
it and type "s." and try and tab complete -- its as if its scanning all
without a schema qualifier or using an index.  Sometimes it does not match
valid tables at all, and sometimes regex matching fails too ('\dt
schema.*_*_*' intermittently flakes out if it returns a lot of matches).
Other than that the number of tables doesn't seem to cause much performance
trouble.  The only exception is constraint exclusion which is fundamentally
broken with too many tables on the performance and memory consumption side.

Having a lot of tables really makes me wish VACUUM, ANALYZE, and other
maintenance tools could partially matched object names with regex though.

On the other hand, lots of databases probably has performance drawbacks too.
And its maintenance drawbacks are even bigger.

I certainly don't see any reason at all to try and put all of these in one
schema.  The only useful choices are schemas vs databases.  I'd go for
schemas unless the performance issues there are a problem.   Schemas can be
dumped/restored/backed up independent of one another easily too.

> merlin
> 
> --
> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance