Gordan Bobic wrote:
Note that DRBD resync is more efficient - it only resyncs dirty
blocks, which in the case of big databases, can be much faster.
Gluster will copy the whole file.
Thanks for pointing that out; I'll have to think about that. I had been
hoping GlusterFS did some sort of rsync equivalent, although even that
would still require reading the whole file locally.
Did you flush the caches inbetween the tries? What is your network
connection between the nodes?
I attempted to prime the cache for each measurement. I shut MySQL down
between tries, made the GlusterFS adjustments, restarted MySQL, and ran
the queries a few times before recording the stats.
I think the connection is 100M Ethernet. I'll have to double-check.
It's actually a Xen guest, so I'm a bit insulated. The nodes are on
separate physical servers, though, on purpose.
What is the ping time between the servers? Have you measured the
throughput between the servers with something like ftp on big files?
Is it the writes or the reads that slow down? Try dumping to a ext3
from gluster.
Ping time is around 0.3ms. I'll have to spend some time doing these
other tests.
I resumed testing with both servers running. Switching the I/O
scheduler to deadline had no appreciable affect. Neither did adding
client-side io-threads, or server-side write-behind. Surprisingly, I
found that changing read-subvolume to the remove server had only a
minor penalty.
Are you using single process client/server on each node, or separate
client and server processes on both nodes?
Single process on each node.
Since I'm only going to have one server writing to the filesystem at
a time, I could mount it read-only (or not at all) on the other
server. Would that mean I could safely set data-lock-server-count=0
and entry-lock-server-count=0 because I can be confident that there
won't be any conflicting writes? I don't want to take unnecessary
risks, but it seems like unnecessary overhead for my use case.
Hmm... If the 1st server fails, the lock server will fail to the next
one, and you fire up MySQL there then. I thought you said it was only
the 2nd server that suffers the penalty. Since the 2nd server will
fail over locking from the 1st if the 1st fails, the performance
should be the same after fail-over. You'll still have the active
server being the lock server.
The second server suffers a large penalty for having to lock on the
first server. I wonder if the first server might still be bearing an
unnecessary cost for doing the locking, even though it's faster when
it's local.
Another wrinkle is that I'd rather have the servers be equal peers so
that I don't need to have MySQL fail back to server #1 as soon as it
comes back up. If I want server #2 to stay fast even after server #1
comes back up, I'd need to stop GlusterFS, reorder the volfile on both
servers, and restart it. That seems somewhat difficult (particularly
changing the volfile on server #1 before it comes back up), and it would
be unnecessary if locking isn't adding any value.
Thanks!
-David