Re: postgre vs MySQL

paul rivers <rivers.paul@xxxxxxxxx> · Wed, 12 Mar 2008 13:02:45 -0700

Alvaro Herrera wrote:
Ivan Sergio Borgonovo wrote:

On Wed, 12 Mar 2008 09:13:14 -0700
paul rivers <rivers.paul@xxxxxxxxx> wrote:

For a database of InnoDB tables, people tend to replicate the
database, and then backup the slave (unless the db is trivially

That recalled me the *unsupported* feeling I have that it is easier
to setup a HA replication solution on MySQL.

Well, if you have a crappy system that cannot sustain concurrent load or
even be backed up concurrently with regular operation, one solution is
to write a kick-ass replication system.

The other solution is to enhance the ability of the system to deal with
concurrent operation.

We keep hearing how great all those Web 2.0 sites are; Slashdot, Flickr,
etc; and they all run on farms and farms of MySQL servers, "because
MySQL replication is so good".  I wonder if replication is an actual
_need_ or it's there just because the other aspects of the system are so
crappy

"Kick-ass" imho really means "really simple to setup" and included as 
part of the standard db.

There are all kinds of corner cases that can bite you with MySQL 
replication. Offhand, I wager most of these (at least in InnoDB) result 
from the replication "commit" status of a transaction is in the binlogs, 
which is not the same as the InnoDB database commit status in the .ibd 
files. Writing out binlog entries happens at a higher level than the 
storage engine, and so it's not hard to imagine what can go wrong there. 
There are a few my.cnf settings that let you really roll the dice with 
data integrity based on this dichotomy, if you so choose.

In those high volume shops, imho replication is a requirement, but in 
part to overcome technical limitations of MySQL. Or to phrase it from a 
MySQL point of view, to do it the MySQL way. If you have 50-ish minutes, 
this video by the YouTube people talks about their evolution with MySQL 
(among many other things) :

http://video.google.com/videoplay?docid=-6304964351441328559

The summary from the video is:

- Start with a MySQL instance using InnoDB
- Go to 1-M replication, and use the replicants as read-only version.
- Eventually the cost of replication outweighs the gains, so go to 
database sharding
- Keep 1-M replication within a shard group to allow easy backups of a 
slave, some read-only use of the slaves, and a new master in case of 
master failure (i.e. high availability)

Almost everyone finds MyISAM unworkable in large scale environments 
because of the repairs necessary post-crash.

Big complaints about MySQL high-volume shops often, imho, come back to :

- You can only have so many active threads in the InnoDB storage engine 
module at a time. See e.g.:

http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#option_mysqld_innodb_thread_concurrency

- Auto_increment columns as pkeys in InnoDB tables are practically 
required, yet severely limited scalability due to how a transaction 
would lock the structure to get the next auto-increment (significantly 
improved in 5.1)

- Shutting down a MySQL engine can take forever, due partly dirty page 
writes, partly due to insert buffer merging. See:

http://dev.mysql.com/doc/refman/5.1/en/innodb-insert-buffering.html

There are other complaints you'd expect people to have, but don't seem 
to get talked about much, because people are so used to (from my point 
of view) working around them. For example, statistics on an InnoDB table 
are calculated when the table is first accessed, but not stored 
anywhere, so there are extra costs on database startup. The backup issue 
with InnoDB has already been covered. Tablespace management in InnoDB 
seems exceptionally primitive, and is helped somewhat by the 
tablespace-per-table option. There are many more, again imho.

Paul

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general