On Thu, 23 Aug 2012 17:56:37 -0700 Jeff Janes <jeff.janes@xxxxxxxxx> wrote: > On Thu, Aug 23, 2012 at 6:37 AM, Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> wrote: > > In response to "Martin French" <Martin.French@xxxxxxxxxxxxx>: > >> > > >> > I have a table with 40 million rows and haven't had any performance > >> > issues yet. > >> > > >> > Are there any rules of thumb as to when a table starts getting too big? > >> > > >> > For example, maybe if the index size is 6x the amount of ram, if the > >> > table is 10% of total disk space, etc? > >> > >> My rule here is that a table is too big when performance starts degrading beyond an acceptable level. > > > > The challenge there is that if you wait until performance degrades > > beyond an acceptable level, you've allowed yourself to get into a > > situation where clients are upset and frustrated, and fixing the > > problem is difficult because there's so much data to manipulate to > > rearrange things. > > Yes, I agree with that. > > > And the advice I have along those lines is to establish now what > > constitutes unacceptable performance, and put some sort of monitoring > > and tracking in place to know what your performance degradation looks > > like and predict when you'll have to react. For example, a MRTG > > graph that runs an experimental query once a day during off hours and > > graphs the time it takes vs. the # of rows in the table will prove > > a valuable tool that can sometimes predict exactly when you'll have > > to change things before it becomes a problem. > > This seems inconsistent with your previous advice. By the time your > experimental query shows a problem, you no longer have any maintenance > windows left large enough to fix it. Unless your experimental query > was a reindex or something non-production like that, in which case > running it on a production server, even off-hours, doesn't seem like a > good idea. Perhaps I didn't explain the approach sufficiently. If you can establish something like, "This specific SELECT has to run in under 5 minutes to meet the client's expectations" you can then time how long that query takes each time it's run (by capturing that information in the application, for example ... or by running it in some automated fashion ... possibly other methods as well). If you capture that runtime on a regular basis and put the results on a graph in concert with other relevant data, such as the number of rows in the related tables, size of the data, etc, you quickly get a good picture of how fast things are growing, and frequently you can project the line out into the future and say things like "if we don't come up with a better way to do this by Sept of next year, we're going to exceed our allowed run time." You can then take that very detailed information to business planners and point out that they need to schedule developer time _before_ then if they don't want the application to slow down below the allowable level. Unless you work for somewhere that has unlimited resources, your time is always split between feature requests, day to day operations, firefighting, etc. In my experience, keeping things like this under control is often a matter of having enough information to justify why your optimization project is more important than whizbang feature x that marketing wants so bad. Of course, if you work somewhere with unlimited resources, you should let me know so I can send in my resume. And none of what I'm suggesting is intended to belittle the other suggestions either -- if you know of a way to optimize the data better, why not do it now? If you can be purging old data, why wait until performance is a problem to start purging, etc. It's just another trick to have in your bag. -- Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general