Re: Feature Request --- was: PostgreSQL Performance Tuning

Dan Harris <fbsd@xxxxxxxxxxxxxxx> · Fri, 27 Apr 2007 14:27:51 -0600

Bill Moran wrote:
In response to Dan Harris <fbsd@xxxxxxxxxxxxxxx>:
<snip>
Why does the user need to manually track max_fsm_pages and max_fsm_relations?  I 
bet there are many users who have never taken the time to understand what this 
means and wondering why performance still stinks after vacuuming their database 
( spoken from my own experience )

But there are two distinct routes that can be taken if there's not enough
fsm space: add fsm space or vacuum more frequently.  I don't want the system
to eat up a bunch of memory for fsm entries if my workload indicates that
I can easily vacuum more frequently.

There's no magic bullet here, but heuristics should be able to tell us you can 
"easily vacuum more frequently"  And again, I said these things would be 
*optional*.  Like an item in postgresql.conf 
"i_have_read_the_manual_and_know_what_this_all_means = false #default false". 
If you change it to true, you have all the control you're used to and nothing 
will get in your way.

How about work_mem?  shared_buffers?  column statistics sizes? random_page_cost?

The only one that seems practical (to me) is random_page_cost.  The others are
all configuration options that I (as a DBA) want to be able to decide for
myself.  For example, I have some dedicated PG servers that I pretty much
max those values out at, to let PG know that it can use everything on the
system -- but I also have some shared use machines with PG, where I carefully
constrain those values so that PG doesn't muscle other daemons out of their
share of the RAM (work_mem is probably the best example)

Just because you carefully constrain it does not preclude the ability for 
program logic to maintain statistics to do what I suggested.

It would be nice to have some kind of utility that could tell me what
random_page_cost should be, as I've never felt comfortable tweaking it.
Like some utility to run that would say "based on the seek tests I just
ran, you should set random_page_cost to x".  Of course, if such a thing
existed, it could just fill in the value for you.  But I haven't figured
out how to pick a good value for that setting, so I have no idea how to
suggest to have it automatically set.

Me either, but I thought if there's a reason it's user-settable, there must be 
some demonstrable method for deciding what is best.

Couldn't some fairly simple regression tests akin to a VACUUM process spot 
potential problems?  "Hey, it looks like you need more fsm_relations.. I bumped 
that up automatically for you".  Or "These indexes look bloated, shall I 
automatically reindex them for you?"

A lot of that stuff does happen.  A vacuum verbose will tell you what it
thinks you should do, but I don't _want_ it to do it automatically.  What
if I create huge temporary tables once a week for some sort of analysis that
overload the fsm space?  And if I'm dropping those tables when the analysis
is done, do I want the fsm space constantly adjusting?

I understand *you* don't want it done automatically.  But my suspicion is that 
there are a lot more newbie pg admins who would rather let the system do 
something sensible as a default.  Again, you sound defensive that somehow my 
ideas would take power away from you.  I'm not sure why that is, but certainly 
I'm not suggesting that.  An auto-pilot mode is not a bad idea just because a 
few pilots don't want to use it.

Plus, some is just impossible.  shared_buffers requires a restart.  Do you
want your DB server spontaneously restarting because it thought more
buffers might be nice?

Well, maybe look at the bigger picture and see if it can be fixed to *not* 
require a program restart?  Or.. take effect on the next pid that gets created? 
 This is a current limitation, but doesn't need to be one for eternity does it?

I'm sure there are many more examples, that with some creative thinking, could 
be auto-adjusted to match the usage patterns of the database. PG does an 
excellent job of exposing the variables to the users, but mostly avoids telling 
the user what to do or doing it for them.  Instead, it is up to the user to know 
where to look, what to look for, and how to react to things to improve 
performance.  This is not all bad, but it is assuming that all users are hackers 
( which used to be true ), but certainly doesn't help when the average SQLServer 
admin tries out Postgres and then is surprised at the things they are now 
responsible for managing.  PG is certainly *not* the only database to suffer 
from this syndrome, I know..

I expect the suffering is a result of the fact that databases are non-trivial
pieces of software, and there's no universally simple way to set them up
and make them run well.

Speaking as a former SQL Server admin ( from day 1 of the Sybase fork up to 
version 2000 ), I can say there *is* a way to make them simple.  It's certainly 
not a perfect piece of software, but the learning curve speaks for itself.  It 
can auto-shrink your databases ( without locking problems ).  Actually it pretty 
much runs itself.  It auto-allocates RAM for you ( up to the ceiling *you* 
control ).  It automatically re-analyzes itself.. I was able to successfully 
manage several servers with not insignificant amounts of data in them for many 
years without being a trained DBA.  After switching to PG, I found myself having 
to twiddle with all sorts of settings that seemed like it should just know about 
without me having to tell it.

I'm not saying it was simple to make it do that.  MS has invested LOTS of money 
and effort into making it that way.  I don't expect PG to have features like 
that tomorrow or even next release.  But, I feel it's important to make sure 
that those who *can* realistically take steps in that direction understand this 
point of view ( and with Josh's other reply to this, I think many do ).

I like to think of my systems as good employees.  I don't want to have to 
micromanage everything they do.  I want to tell them "here's what I want done", 
and assuming I made a good hiring choice, they will do it and take some liberty 
to adjust parameters where needed to achieve the spirit of the goal, rather than 
  blindly do something inefficiently because I failed to explain to them the 
absolute most efficient way to accomplish the task.

That's silly.  No software does that.  You're asking software to behave like
humans.  If that were the case, this would be Isaac Asimov's world, not the
real one.

It's not silly.  There are plenty of systems that do that.  Maybe you just 
haven't used them.  Again, SQL Server did a lot of those things for me.  I 
didn't have to fiddle with checkboxes or multi-select tuning options.  It 
learned what its load was and reacted appropriately.  I never had to stare at 
planner outputs and try and figure out why the heck did it choose that plan. 
Although, I certainly could have if I wanted to.  It has a tool called the SQL 
Profiler which will "watch" your workload on the database, do regression testing 
and suggest ( and optionally implement with a single click ) indexes on your 
tables.  I've been wanting to do this for years with PG, and had a small start 
on a project to do just that actually.

Granted, there are some people who don't like the developers making any 
assumptions about their workload.  But this doesn't have to be an either/or 
proposition.  I don't think any control needs to be abandoned.  But 
self-adjusting defaults seem like an achievable goal ( I know, I know, "show us 
the patch" ).  I just don't know if this feeling has resonated well between new 
users and long-term developers.  I know it must be grating to have to answer the 
same questions over and over and over "have you analyzed?  Did you leave 
postgresql.conf at the defaults??".  Seems like a win-win for both sides, IMHO.

Well, it seems like this is happening where it's practical -- autovacuum is
a good example.

Agreed, this is a huge step forward.  And again, I'm not taking an offensive 
posture on this.  Just that I think it's worth giving my .02 since I have had 
strong feelings about this for awhile.

Personally, I wouldn't be opposed to more automagic stuff, just as long as
I have the option to disable it.  There are some cases where I still
disable autovac.

In closing, I am not bashing PG!  I love it and swear by it.  These comments are 
purely from an advocacy perspective.  I'd love to see PG user base continue to grow.

I expect that part of the problem is "who's going to do it?"

Yes, this is the classic problem.  I'm not demanding anyone pick up the ball and 
jump on this today, tomorrow, etc.. I just think it would be good for those who 
*could* make a difference to keep those goals in mind when they continue.  If 
you have the right mindset, this problem will fix itself over time.

-Dan