On Tue, 1 May 2007, Josh Berkus wrote:
there is no standard way even within Linux to describe CPUs, for example. Collecting available disk space information is even worse. So I'd like some help on this portion.
I'm not fooled--secretly you and your co-workers laugh at how easy this is on Solaris and are perfectly happy with how difficult it is on Linux, right?
I joke becuase I've been re-solving some variant on this problem every few years for a decade now and it just won't go away. Last time I checked the right answer was to find someone else who's already done it, packaged that into a library, and appears committed to keeping it up to date; just pull a new rev of that when you need it. For example, for the CPU/memory part, top solves this problem and is always kept current, so on open-source platforms there's the potential to re-use that code. Now that I know that's one thing you're (understandably) fighting with I'll dig up my references on that (again).
It's also hard/impossible to devise tuning algorithms that work for both gross tuning (increase shared_buffers by 100x) and fine tuning (decrease bgwriter_interval to 45ms).
I would advocate focusing on iterative improvements to an existing configuration rather than even bothering with generating a one-off config for exactly this reason. It *is* hard/impossible to get it right in a single shot, because of how many parameters interact and the way bottlenecks clear, so why not assume from the start you're going to do it several times--then you've only got one piece of software to write.
The idea I have in my head is a tool that gathers system info, connects to the database, and then spits out recommendations in order of expected effectiveness--with the specific caveat that changing too many things at one time isn't recommended, and some notion of parameter dependencies. The first time you run it, you'd be told that shared_buffers was wildly low, effective_cache_size isn't even in the right ballpark, and your work_mem looks small relative to the size of your tables; fix those before you bother doing anything else because any data collected with those at very wrong values is bogus. Take two, those parameters pass their sanity tests, but since you're actually running at a reasonable speed now the fact that your tables are no longer being vacuumed frequently enough might bubble to the top.
It would take a few passes through to nail down everything, but as long as it's put together such that you'd be in a similar position to the single-shot tool after running it once it would remove that as something separate that needed to be built.
To argue against myself for a second, it may very well be the case that writing the simpler tool is the only way to get a useful prototype for building the more complicated one; very easy to get bogged down in feature creep on a grand design otherwise.
-- * Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD