Re: TB-sized databases

Robert Treat <xzilla@xxxxxxxxxxxxxxxxxxxxx> · Wed, 5 Dec 2007 15:07:21 -0500

On Thursday 29 November 2007 11:14, Simon Riggs wrote:
> On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote:
> > Gregory Stark <stark@xxxxxxxxxxxxxxxx> writes:
> > > "Simon Riggs" <simon@xxxxxxxxxxxxxxx> writes:
> > >> Tom's previous concerns were along the lines of "How would know what
> > >> to set it to?", given that the planner costs are mostly arbitrary
> > >> numbers.
> > >
> > > Hm, that's only kind of true.
> >
> > The units are not the problem.  The problem is that you are staking
> > non-failure of your application on the planner's estimates being
> > pretty well in line with reality.  Not merely in line enough that
> > it picks a reasonably cheap plan, but in line enough that if it
> > thinks plan A is 10x more expensive than plan B, then the actual
> > ratio is indeed somewhere near 10.
> >
> > Given that this list spends all day every day discussing cases where the
> > planner is wrong, I'd have to think that that's a bet I wouldn't take.
>
> I think you have a point, but the alternative is often much worse.
>
> If an SQL statement fails because of too high cost, we can investigate
> the problem and re-submit. If a website slows down because somebody
> allowed a very large query to execute then everybody is affected, not
> just the person who ran the bad query. Either way the guy that ran the
> query loses, but without constraints in place one guy can kill everybody
> else also.
>
> > You could probably avoid this risk by setting the cutoff at something
> > like 100 or 1000 times what you really want to tolerate, but how
> > useful is it then?
>
> Still fairly useful, as long as we understand its a blunt instrument.
>
> If the whole performance of your system depends upon indexed access then
> rogue queries can have disastrous, unpredictable consequences. Many
> sites construct their SQL dynamically, so a mistake in a seldom used
> code path can allow killer queries through. Even the best DBAs have been
> known to make mistakes.
>

If the whole performance of your system depends upon indexed access, then 
maybe you need a database that gives you a way to force index access at the 
query level? 

> e.g. An 80GB table has 8 million blocks in it.
> - So putting a statement_cost limit = 1 million would allow some fairly
> large queries but prevent anything that did a SeqScan (or worse).
> - Setting it 10 million is going to prevent things like sorting the
> whole table without a LIMIT
> - Setting it at 100 million is going to prevent unconstrained product
> joins etc..

I think you're completly overlooking the effect of disk latency has on query 
times.  We run queries all the time that can vary from 4 hours to 12 hours in 
time based solely on the amount of concurrent load on the system, even though 
they always plan with the same cost.  

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster