Re: Thousands of tables versus on table?

Scott Marlowe <smarlowe@xxxxxxxxxxxxxxxxx> · Tue, 05 Jun 2007 12:09:58 -0500

Thomas Andrews wrote:

On 6/5/07 12:48 PM, "Scott Marlowe" <smarlowe@xxxxxxxxxxxxxxxxx> wrote:

david@xxxxxxx wrote:

On Mon, 4 Jun 2007, Scott Marlowe wrote:

Gregory Stark wrote:

 "Thomas Andrews" <tandrews@xxxxxxxxxxxxxxxxxxxxx> writes:

 I guess my real question is, does it ever make sense to create

thousands >  of

 tables like this?

 Sometimes. But usually it's not a good idea.
 What you're proposing is basically partitioning, though you may not
 actually
 need to put all the partitions together for your purposes.
Partitioning's
 main
 benefit is in the management of the data. You can drop and load
partitions
 in
 chunks rather than have to perform large operations on millions of
 records.

 Postgres doesn't really get any faster by breaking the tables up like
 that. In
 fact it probably gets slower as it has to look up which of the
thousands
 of
 tables you want to work with.

That's not entirely true.  PostgreSQL can be markedly faster using
partitioning as long as you always access it by referencing the
partitioning key in the where clause.  So, if you partition the table
by date, and always reference it with a date in the where clause, it
will usually be noticeably faster.  OTOH, if you access it without
using a where clause that lets it pick partitions, then it will be
slower than one big table.

So, while this poster might originally think to have one table for
each user, resulting in thousands of tables, maybe a compromise where
you partition on userid ranges would work out well, and keep each
partition table down to some 50-100 thousand rows, with smaller
indexes to match.

what if he doesn't use the postgres internal partitioning, but instead
makes his code access the tables named responsesNNNNN where NNNNN is
the id of the customer?

this is what it sounded like he was asking initially.

Sorry, I think I initially read your response as "Postgres doesn't
really get any faster by breaking the tables up" without the "like that"
part.

I've found that as long as the number of tables is > 10,000 or so,

That should have been as long as the number of tables is < 10,000 or so...

having a lot of tables doesn't seem to really slow pgsql down a lot.
I'm sure that the tipping point is dependent on your db machine.  I
would bet that if he's referring to individual tables directly, and each
one has hundreds instead of millions of rows, the performance would be
better.  But the only way to be sure is to test it.

Please stop top posting.  This is my last reply until you stop top posting.

So, partitioning in PSQL 8 is workable, but breaking up the table up into
actual separate tables is not?

Ummm, that's not what I said.  They're similar in execution.  However, 
partitioning might let you put 100 customers into a given table, if, 
say, you partitioned on customer ID or something that would allow you to 
group a few together.
Another solution we have proposed is having 'active' and 'completed' tables.
So, rather than thousands, we'd have four tables:

   responders_active
   responders_completed
   responses_active
   responses_completed

That's not a bad idea.  Just keep up on your vacuuming.