Re: TB-sized databases

Russell Smith <mr-russ@xxxxxxxxxx> · Fri, 30 Nov 2007 17:41:53 +1100

Simon Riggs wrote:
On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote:

Simon Riggs wrote:

All of those responses have cooked up quite a few topics into one. Large
databases might mean text warehouses, XML message stores, relational
archives and fact-based business data warehouses.

The main thing is that TB-sized databases are performance critical. So
it all depends upon your workload really as to how well PostgreSQL, or
another other RDBMS vendor can handle them.

Anyway, my reason for replying to this thread is that I'm planning
changes for PostgreSQL 8.4+ that will make allow us to get bigger and
faster databases. If anybody has specific concerns then I'd like to hear
them so I can consider those things in the planning stages

it would be nice to do something with selects so we can recover a rowset 
on huge tables using a criteria with indexes without fall running a full 
scan.

In my opinion, by definition, a huge database sooner or later will have 
tables far bigger than RAM available (same for their indexes). I think 
the queries need to be solved using indexes enough smart to be fast on disk.

OK, I agree with this one. 

I'd thought that index-only plans were only for OLTP, but now I see they
can also make a big difference with DW queries. So I'm very interested
in this area now.

If that's true, then you want to get behind the work Gokulakannan 
Somasundaram 
(http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php) has 
done with relation to thick indexes.  I would have thought that concept 
particularly useful in DW.  Only having to scan indexes on a number of 
join tables would be a huge win for some of these types of queries.

My tiny point of view would say that is a much better investment than 
setting up the proposed parameter.  I can see the use of the parameter 
though.  Most of the complaints about indexes having visibility is about 
update /delete contention.  I would expect in a DW that those things 
aren't in the critical path like they are in many other applications.  
Especially with partitioning and previous partitions not getting may 
updates, I would think there could be great benefit.  I would think that 
many of Pablo's requests up-thread would get significant performance 
benefit from this type of index.  But as I mentioned at the start, 
that's my tiny point of view and I certainly don't have the resources to 
direct what gets looked at for PostgreSQL.

Regards

Russell Smith

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match