Search Postgresql Archives

Re: When should I worry?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Jun 10, 2007, at 2:14 PM, Joe Conway wrote:


Bill Moran wrote:
Tom Allison <tom@xxxxxxxxxxx> wrote:

If the user base gets to 100 or more, I'll be hitting a billion rows before too long. I add about 70,000 rows per user per day. At 100 users this is 7 million rows per day. I'll hit a billion in 142 days, call it six months for simplicity.

The table itself is small (two columns: bigint, int) but I'm wondering when I'll start to hit a knee in performance and how I can monitor that. I know where I work (day job) they have Oracle tables with a billion rows that just plain suck. I don't know if a billion is bad or if the DBA's were not given the right opportunity to make their tables work.

But if they are any indication, I'll feeling some hurt when I exceed a billion rows. Am I going to just fold up and die in six months?

Alot depends on your specific use case.

- Will you be just storing the data for archival purposes, or frequently querying the data?

- If you need to run queries, are they well bounded to certain subsets of the data (e.g. a particular range of time for a particular user) or are they aggregates across the entire billion rows?

- Is the data temporal in nature, and if so do you need to purge it after some period of time?

As an example, I have an application with temporal data, that needs periodic purging, and is typically queried for small time ranges (tens of minutes). We have set up partitioned tables (partitioned by date range and data source -- akin to your users) using constraint exclusion that contain 3 or 4 billion rows (total of all partitions), and we have no problem at all with performance. But I suspect that if we needed to do an aggregate across the entire thing it would not be particularly fast ;-)

Why not just create a simulation of 100 users and run it as hard as your can until it starts to degrade? Then you'll have some real-world experience
to tell you how much you can handle.

This is good advice. Without much more detail, folks on the list won't be able to help much, but a with simulation such as this you can answer your own question...

Joe



Good questions. I guess there are two answers. There are times when I will want aggregate data and I'm not as concerned about the execution time. But there are other queries that are part of the application design. These are always going to be of a type where I know a single specific primary key value and I want to find all the rows that are related.


First table has a row of
idx serial primary key
Third table has a row of
idx bigserial primary key

and a second table (the billion row table) consistes of two rows:
first_idx integer not null references first(idx) on delete cascade,
third_idx bigint not null references third(idx) on delete cascade,
constraint pkey_first_third primary key (first_idx, third_idx)

The common query will be:

select t.string
from first f, second s, third t
where f.idx = s.first_idx
and s.third_idx = t.idx
and f.idx = 4 (or whatever...).

So, I think the answer is that the data isn't going to be temporal or otherwise segragated or subsets.
I'll assume this is a lead in for partitions?
The data will be queried very frequently. Probably plan on a query every 10 seconds and I don't know what idx ranges will be involved. Would it be possible to partition this by the first_idx value? An improvement?


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux