Re: Thousands of tables versus on table?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 5 Jun 2007, Gregory Stark wrote:

"Scott Marlowe" <smarlowe@xxxxxxxxxxxxxxxxx> writes:

Sorry, I think I initially read your response as "Postgres doesn't really get
any faster by breaking the tables up" without the "like that" part.

Well breaking up the tables like that or partitioning, either way should be
about equivalent really. Breaking up the tables and doing it in the
application should perform even better but it does make the schema less
flexible and harder to do non-partition based queries and so on.

but he said in the initial message that they don't do cross-customer reports anyway, so there really isn't any non-partition based querying going on anyway.

I guess I should explain what I originally meant: A lot of people come from a
flat-file world and assume that things get slower when you deal with large
tables. In fact due to the magic of log(n) accessing records from a large
index is faster than first looking up the table and index info in a small
index and then doing a second lookup in up in an index for a table half the
size.

however, if your query plan every does a sequential scan of a table then you are nog doing a log(n) lookup are you?

Where the win in partitioning comes in is in being able to disappear some of
the data entirely. By making part of the index key implicit in the choice of
partition you get away with a key that's half as large. And in some cases you
can get away with using a different key entirely which wouldn't otherwise have
been feasible to index. In some cases you can even do sequential scans whereas
in an unpartitioned table you would have to use an index (or scan the entire
table).

But the real reason people partition data is really for the management ease.
Being able to drop, and load entire partitions in O(1) is makes it feasible to
manage data on a scale that would simply be impossible without partitioned
tables.

remember that the origional question wasn't about partitioned tables, it was about the performance problem he was having with one large table (slow insert speed) and asking if postgres would collapse if he changed his schema to use a seperate table per customer.

I see many cases where people advocate collapsing databases/tables togeather by adding a column that indicates which customer the line is for.

however I really don't understand why it is more efficiant to have a 5B line table that you do a report/query against 0.1% of then it is to have 1000 different tables of 5M lines each and do a report/query against 100% of. it would seem that the fact that you don't have to skip over 99.9% of the data to find things that _may_ be relavent would have a noticable cost in and of itself.

David Lang


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux