Re: Configuration Recommendations

"Greg Sabino Mullane" <greg@xxxxxxxxxxxx> · Tue, 15 May 2012 17:53:57 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

>>> Is it established practice in the Postgres world to separate indexes
>>> from tables? I would assume that the reasoning of Richard Foote -
>>> albeit for Oracle databases - is also true for Postgres:
>
>> Yes, it's an established practice. I'd call it something just short of
>> a best practice though, as it really depends on your situation.
>
> What are the benefits?

Disk seeks, basically. Yes, there are a lot of complications regarding 
all the various hardware/OS/PG level cachings, but at the end of the 
day, it's less work to have each drive concentrate on a single area 
(especially as we always require a heap scan at the moment).

>> I also find his examples a bit contrived, and the whole "multi-user"
>> argument irrelevant for common cases.
>
> Why is that?

Because most Postgres servers are dedicated to serving the same data 
or sets of data, and the number of "other users" calling ad-hoc queries 
against lots of different tables (per his example) is small. So this 
sentence just doesn't ring true to me:

    " ... by the time weÃ¢ve read the index leaf block, processed and 
    read all the associated table blocks referenced by the index leaf 
    block, the chances of there being no subsequent physical activity 
    in the index tablespace due to another user session is virtually 
    nil. We would still need to re-scan the disk to physically access 
    the next index leaf block (or table block) anyways."

That's certainly not true for Postgres servers, and I doubt if it 
is quite that bad on Oracle either.

>> I lean towards using separate tablespaces in Postgres, as the 
>> performance outweighs the additional>> complexity.

> What about his argument with regards to access patterns (i.e.
> interleaving index and table access during an index scan)?  Also,
> Shaun's advice to have more spindles available sounds convincing to
> me, too.

I don't buy his arguments. To do so, you'd have to buy a key point:

    "when most physical I/Os in both index and table segments are 
     effectively random, single block reads"

They are not; hence, the rest of his argument falls apart. Certainly, 
if things were as truly random and uncached as he states, there would 
be no benefit to separation.

As far as spindles, yes: like RAM, it's seldom the case to have 
too litte :) But as with all things, one should get some benchmarks 
on your specific workload before making hardware changes. (Well, RAM 
may be an exception to that, up to a point).

>> It's down on the tuning list however: much more important
>> is getting your kernel/volumes configured correctly, allocating
>> shared_buffers sanely, separating pg_xlog, etc.

> That does make a lot of sense.  Separating pg_xlog would probably the
> first thing I'd do especially since the IO pattern is so dramatically
> different from tablespace IO access patterns.

Yep - moving pg_xlog to something optimized for small, constantly 
written files is one of the biggest and easiest wins. Other than 
fsync = off ;)

- -- 
Greg Sabino Mullane greg@xxxxxxxxxxxx
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201205151351
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAk+yl8YACgkQvJuQZxSWSshB+QCghfweMspFIqmP4rLv6/tcGPot
jscAn1SZAP1/KBcu/FEpWXilSnWjlA6Z
=FX7j
-----END PGP SIGNATURE-----

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance