On 6/1/07, Andrew Sullivan <ajs@xxxxxxxxxxxxxxx> wrote:
These are all different solutions to different problems, so it's not surprising that they look different. This was the reason I asked, "What is the problem you are trying to solve?"
You mean aside from the obvious one, scalability? The databases is becoming a bottleneck for a lot of so-called "Web 2.0" apps which use a shared-nothing architecture (such as Rails, Django or PHP) in conjunction with a database. Lots of ad-hoc database queries that come not just from web hits but also from somewhat awkwardly fitting an object model onto a relational database. These "new" apps are typically intensely personal and contextual, where every page is personalized for the visiting user, and doing a whole bunch of crazy multijoin queries to fetch the latest posts, the most recent recommendations from your friends, the most highly rated stuff. In fact, merely doing something seemingly simple like incrementing a row's counter every time a post has been viewed is eventually going to have a negative performance impact on a traditional OLTP-optimized relational database. I'm sure some people would disagree with the significance of the above (possibly by replying that a relational database is the wrong kind of tool for such apps), or that there is an urgent need to scale beyond the single server, but I would hope that there would, at some point, appear a solution that could enable a database to scale horizontally with minimal impact on the application. In light of this need, I think we could be more productive by rephrasing the question "how/when we can implement multimaster replication?" as "how/when can we implement horizontal scaling?". As it stands today, horizontally partitioning a database into multiple separate "shards" is incredibly invasive on the application architecture, and typically relies on brittle and non-obvious hacks such as configuring sequence generators with staggered starting numbers, omitting referential integrity constraints, sacrificing transactional semantics, and moving query aggregation into the app level. On top of this, dumb caches such as Memcached are typically layered to avoid hitting the database in the first place. Still, with MySQL and a bit of glue, guys like eBay, Flickr and MySpace are partitioning their databases relatively successfully using such tricks. These guys are not average database users, but not they are not the only ones that have suffered from database bottlenecks and overcome them using clever, if desperate, measures. Cal Henderson (or was it Stewart Butterfield?) of Flickr has famously said he would never again start a project that didn't have a partitioning from the start. I would love to see a discussion about how PostgreSQL could address these issues. Alexander.