Re: maximizing VM performance (on CEPH)

Christian Balzer <chibi@xxxxxxx> · Mon, 20 Jan 2014 14:59:29 +0900

Hello,

On Sat, 18 Jan 2014 18:51:29 -0500 Gautam Saxena wrote:

> I'm trying to maximize emphemeral Windows 7 32-bit performance with
> CEPH's RBD as back-end storage engine. (I'm not worried about data loss,
> as these VMs are all ephemeral, but I am worried about performance and
> responsiveness of the VMs.) My questions are:
> 
The first thing you probably want to determine is, what is the amount of
writes happening on those RBDs? 
IOPS and also type of writes, as in forced out by flushes from the OS or
application versus things that are allowed to accumulate in disk caches.

> 
> 1) Are there any recommendations or best-practices on the CEPH RBD cache
> settings?  I don't fully understand how the above parameters come into
> play? Can someone provide some clarification, perhaps through a
> quick-and-dirty scenario example?
> 
> The defaults seem low. So, I'm thinking of setting the RBD cache  size to
> something like 1 GB, the cache max dirty to 1 GB, the cache target dirty
> to 500 MB, and the cache max dirty age to say 30 seconds.
> 
Your servers must be filled to the brim with RAM if you are considering
giving each RBD mapping a 1GB cache. 
Again, see above, this will depend on the type of writes, but I think
you'll quickly see diminishing returns here. 
The RBD cache will work best for consolidating lots of small, cachable
writes which would otherwise result in many IOPS on the storage backend
(ODS). 
Personally I'm thinking of doubling the defaults once I get to that stage.
You might also find that giving the OS that RAM instead could be
beneficial.

> 2) What do people thinking of my using a separate pool of replication
> factor 1 for the "copy-on-write" portion of the clones of these
> *ephemeral* VMs? Would this further improve performance for these
> *ephemeral* Windows VMs?
> 
This will speed up things, since only one OSD needs to confirm the write
(to a SSD backed journal since you're that performance conscious). 
However no matter how much you don't mind data loss, I would be worried
about the fact that ALL these RBDs will be unavailable if just one disk
(OSD) fails.

> 3) In addition to #2, what if I made this addition pool (of replication
> factor 1) reside on the host node's RAM (ramdisk)? Pros/cons to this
> idea? (I'm hoping this would minimize impact of boot storms and also
> improve overall responsiveness.)

How many VMs and associated RBDs are we talking about here?

If you create a pool on a ram disk of just one storage node, you're likely
to saturate the network interface of that node. And of course that pool
would have to be created/populated by some scripts of your own doing
before ceph is started...

Your best bet is probably a design where the number of OSDs per node is
matched to that nodes capabilities (CPU/RAM and most of all network
bandwidth) and then deploy as many of these nodes as sensible.

For example with a single 10GigE public network interface per OSD node 10
disks (OSDs) would easily be able to saturate that link. However that's
just bandwidth, if you're IOPS bound (and mostly that seems to be the case)
then more disks per node can make sense. Of course more storage nodes will
help even more, it all becomes a question of how much money and rack
space you're willing to spend. ^o^

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com