Re: [PERFORMANCE] expanding to SAN: which portion best to move

Robert Klemme <shortcutter@xxxxxxxxxxxxxx> · Tue, 17 May 2011 09:00:31 +0200

On Mon, May 16, 2011 at 4:31 PM, Robert Haas <robertmhaas@xxxxxxxxx> wrote:
> On Mon, May 16, 2011 at 4:19 AM, Robert Klemme
> <shortcutter@xxxxxxxxxxxxxx> wrote:
>>> - If the planner chooses a Bitmap Index Scan, it effectively scans the
>>> index to figure out which table blocks to read, and then reads those
>>> table blocks in block number order, so that the I/O is sequential,
>>> with skips.
>>
>> Are these two separate phases (i.e. first scan index completely, then
>> access table)?
>
> Yes.

So then a single query will only ever access one of both at a time.

>> Separating index and tables might not be a totally good idea
>> generally.  Richard Foote has an excellent article about Oracle but I
>> assume at least a few things do apply to PostgreSQL as well - it's at
>> least worth as something to check PostgreSQL's access patterns
>> against:
>>
>> http://richardfoote.wordpress.com/2008/04/16/separate-indexes-from-tables-some-thoughts-part-i-everything-in-its-right-place/
>>
>> I would probably rather try to separate data by the nature and
>> frequency of accesses.  One reasonable separation would be to leave
>> all frequently accessed tables *and* their indexes on local RAID and
>> moving less frequently accessed data to the SAN.  This separation
>> could be easily identified if you have separate tables for current and
>> historic data.
>
> Yeah, I think the idea of putting tables and indexes in separate
> tablespaces is mostly to bring more I/O bandwidth to bear on the same
> data.

Richard commented on that as well, I believe it was in
http://richardfoote.wordpress.com/2008/04/18/separate-indexes-from-tables-some-thoughts-part-ii-there-there/

The main point is that you do not benefit from the larger IO bandwidth
if access patterns do not permit parallel access to both disks (e.g.
because you first need to read index blocks in order to know the table
blocks to read).  The story might be different though if you have a
lot of concurrent accesses.  But even then, if the table is a hotspot
chances are that index blocks are cached and you only need physical IO
for table blocks...

>  But there are other reasonable things you might do also - e.g.
> put the indexes on an SSD, and the tables on a spinning disk, figuring
> that the SSD is less reliable but you can always rebuild the index if
> you need to...

Richard commented on that theory as well:
http://richardfoote.wordpress.com/2008/05/02/indexes-in-their-own-tablespace-recoverability-advantages-get-back/

The point: if you do the math you might figure that lost indexes lead
to so much downtime that you don't want to risk that and the rebuild
isn't all that simple (in terms of time).  For a reasonable sized
database recovery might be significantly faster than rebuilding.

> Also, a lot of people have reported big speedups from putting pg_xlog
> on a dedicated RAID 1 pair, or moving the PostgreSQL logs off the data
> partition.  So those sorts of divisions should be considered also.

Now, this is something I'd seriously consider because access patterns
to pg_xlog are vastly different than those of table and index data!
So you want to have pg_xlog on a device with high reliability and high
write speed.

> Your idea of dividing things by access frequency is another good
> thought.

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance