Re: Databases vs Schemas

Merlin Moncure <mmoncure@xxxxxxxxx> · Sat, 10 Oct 2009 09:26:19 -0400

On Fri, Oct 9, 2009 at 10:50 PM, Scott Carey <scott@xxxxxxxxxxxxxxxxx> wrote:
> On 10/9/09 2:02 PM, "Merlin Moncure" <mmoncure@xxxxxxxxx> wrote:
>
>> On Fri, Oct 9, 2009 at 1:46 PM, Scott Otis <scott.otis@xxxxxxxxxx> wrote:
>>> Over the next couple of months we will be creating an instance of our
>>> solution for each public school district in the US which is around 18,000.
>>> That means currently we would be creating 18,000 databases (all on one server
>>> right now  which is running 8.4).  I am assuming this is probably not the
>>> best way of doing things.
>>
>> Schema advantages:
>> *) maintenance advantages; all functions/trigger functions can be
>> shared.  HUGE help if you use them
>> *) can query shared data between schemas without major headaches
>> *) a bit more efficiency especially if private data areas are small.
>> kinda analogous to processes vs threads
>> *) Can manage the complete system without changing database sessions.
>> This is the kicker IMO.
>>
>> Database Advantages:
>> *) More discrete.  Easier to distinctly create, dump, drop, or move to
>> separate server
>> *) Smaller system catalogs might give efficiency benefits
>>
>
> I'm concerned how a system with 57 * 18000 > 1M tables will function.
>
> I've got 200,000 tables in one db (8.4), and some tools barely work.  The
> system catalogs get inefficient when large and psql especially has trouble.
> Tab completion takes forever, even if I make a schema "s" with one table in
> it and type "s." and try and tab complete -- its as if its scanning all
> without a schema qualifier or using an index.  Sometimes it does not match
> valid tables at all, and sometimes regex matching fails too ('\dt
> schema.*_*_*' intermittently flakes out if it returns a lot of matches).
> Other than that the number of tables doesn't seem to cause much performance
> trouble.  The only exception is constraint exclusion which is fundamentally
> broken with too many tables on the performance and memory consumption side.
>
> Having a lot of tables really makes me wish VACUUM, ANALYZE, and other
> maintenance tools could partially matched object names with regex though.
>
> On the other hand, lots of databases probably has performance drawbacks too.
> And its maintenance drawbacks are even bigger.
>
> I certainly don't see any reason at all to try and put all of these in one
> schema.  The only useful choices are schemas vs databases.  I'd go for
> schemas unless the performance issues there are a problem.   Schemas can be
> dumped/restored/backed up independent of one another easily too.

They can, but: drop schema foo cascade; is a different operation than:
drop database foo;  The first is kinda surgical and the second is a
rocket launcher.  What would you rather have in battle?

For the record, just about every database I've ever designed has had
some of what I call 'de facto table partitioning' using
schemas/search_path tricks.  I'm working on a system right now that is
going to get very large and if I started to run into psql problems I'd
probably look at patching it, maybe \set an option  to simplify some
of the queries.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance