Re: The need for clustered indexes to boost TPC-V performance

Daniel Farina <daniel@xxxxxxxxxx> · Wed, 4 Jul 2012 06:39:38 -0700

On Tue, Jul 3, 2012 at 10:43 PM, Craig Ringer <ringerc@xxxxxxxxxxxxx> wrote:
> On 07/04/2012 07:13 AM, Reza Taheri wrote:
>
> Following the earlier email introducing the TPC-V benchmark, and that we are
> developing an industry standard benchmarking kit for TPC-V using PostgreSQL,
> here is a specific performance issue we have run into.
>
>
> Which version of PostgreSQL are you using?
>
> How has it been tuned beyond the defaults - autovacuum settings,
> shared_buffers, effective_cache_size, WAL settings, etc?
>
> How much RAM is on the blade? What OS and version are on the blade?
>
>
> Comparing the table sizes, we are close to 2X larger (more on this in a
> later note). But the index size is what stands out. Our overall index usage
> (again, after accounting for different numbers of rows) is 4.8X times
> larger. 35% of our I/Os are to the index space. I am guessing that the 4.8X
> ballooning has something to do with this, and that in itself explains a lot
> about our high I/O rate, as well as higher CPU/tran cycles compared to MS
> SQL (we are  2.5-3 times slower).
>
> This is making me wonder about bloat issues and whether proper vacuuming is
> being done. If the visibility map and free space map aren't maintained by
> proper vaccum operation everything gets messy, fast.
>
> Well, MS SQL used a “clustered index” for CT, i.e., the data is held in the
> leaf pages of the index B-Tree. The data and index are in one data
> structure. Once you lookup the index, you also have the data at zero
> additional cost.
>
> [snip]
>
>
>
> Is the PGSQL community willing to invest in a feature that a) has been
> requested by many others already; and b) can make a huge difference in a
> benchmark that can lend substantial credibility to PGSQL performance?
>
>
> while PostgreSQL doesn't support covering indexes or clustered indexes at
> this point, 9.2 has added support for index-only scans, which are a half-way
> point of sorts. See:
>
>   http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
>   http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
>
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27
>
> If at all possible please see how your test is affected by this PostgreSQL
> 9.2 enhancement. It should make a big difference, and if it doesn't it's
> important to know why.
>
> (CC'd Robert Haas)
>
> I'm not sure what the best option for getting a 9.2 beta build for Windows
> is.
>
>
> As for the "invest" side - that's really a matter for EnterpriseDB, Command
> Prompt, Red Hat, and the other backers who're employing people to work on
> the DB. Consider asking on pgsql-hackers, too; if nothing else you'll get a
> good explanation of the current state and progress toward clustered indexes.
>
> Some links that may be useful to you are:
>
>   http://wiki.postgresql.org/wiki/Todo
>   Things that it'd be good to support/implement at some point. Surprisingly,
> covering/clustered indexes aren't on there or at least aren't easily found.
> It's certainly a much-desired feature despite its apparent absence from the
> TODO.

I think there is, deservingly, a lot of hesitation to implement a
strictly ordered table construct.  A similar feature that didn't quite
get finished -- but maybe can be beaten into shape -- is the
grouped-index-tuple implementation:

http://community.enterprisedb.com/git/

It is mentioned on the TODO page.  It's under the category that is
perhaps poorly syntactically overloaded in the world "cluster".

-- 
fdr

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance