Search Postgresql Archives

Re: Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Sun, Feb 26, 2012 at 12:11 PM, Stefan Keller <sfkeller@xxxxxxxxx> wrote:
Thanks to all who responded so far. I got some more insights from Mike
Stonebraker himself in the USENIX talk Scott pointed to before.
I'd like to revise the four points a little bit I enumerated in my
initial question and to sort out what PG already does or could do:

1. Buffering Pool

To get rid of I/O bounds Mike proposes in-memory database structures.
He argues that it's impossible to be implemented by "old elephants"
because it would be a huge code rewrite since there is also a need to
store memory structures (instead disk oriented structures).
Now I'm still wondering why PG could'nt realize that probably in
combination with unlogged tables? I don't overview the respective code
but I think it's worthwhile to discuss even if implementation of
memory-oriented structures would be to difficult.

The reason is that the data structures assume disk-based data structures, so they are written to be efficient to look up on disk but not as efficient in memory.

Note that VoltDB is a niche product and Stonebreaker makes this pretty clear.  However, the more interesting question is what the tradeoffs are when looking at VoltDB vs Postgres-XC.
 

2. Locking

This critique obviously does'nt hold for PG since we have MVCC here already.

3. WAL logging

Here Mike proposes replication over several nodes as an alternative to
WAL which fits nicely with High Availability. PG 9 has built-in
replication but just not for unlogged tables :-<

I find it interesting that two of the four areas he identifies have to do with durability.....
 

4. Latches

This is an issue I never heard before. I found some notion of latches
in the code but I does'nt seem to be related to concurrently accessing
btree structures as Mike suggests.
So if anyone could confirm that this problem exists producing overhead
I'd be interested to hear.
Mike proposes single-threads running on many cores where each core
processes a non overlapping shard.
But he also calls for ideas to invent btrees which can be processed
concurrently with as less memory locks as possible (instead of looking
to make btrees faster).

So to me the bottom line is, that PG already has reduced overhead at
least for issue #2 and perhaps for #4.
Remain issues of in-memory optimization (#2) and replication (#3)
together with High Availability to be investigated in PG.


If he were looking at PostgreSQL for #4, I think that would be stuff like waiting for semaphores...  I suspect that since this work is probably really minimal and PostgreSQL is single-threaded per process, that this would be low overhead in this area.

The issue seems to be concurrent access to shared data structures, which are a problem particularly when you start looking at multithreaded backends......

Best Wishes,
Chris Travers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux