Re: How to tell a VM to write more local ceph nodes than to the network.

Roland Giesler <roland@xxxxxxxxxxxxxx> · Fri, 16 Jan 2015 12:52:43 +0200

On 14 January 2015 at 21:46, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
On Tue, Jan 13, 2015 at 1:03 PM, Roland Giesler <roland@xxxxxxxxxxxxxx> wrote:

> I have a 4 node ceph cluster, but the disks are not equally distributed

> across all machines (they are substantially different from each other)

>

> One machine has 12 x 1TB SAS drives (h1), another has 8 x 300GB SAS (s3) and

> two machines have only two 1 TB drives each (s2 & s1).

>

> Now machine s3 has by far the most CPU's and RAM, so I'm running my VM's

> mostly from there, but I want to make sure that the writes that happen to

> the ceph cluster get written to the "local" osd's on s3 first and then the

> additional writes/copies get done to the network.

>

> Is this possible with ceph.  The VM's are KVM in Proxmox in case it's

> relevant.

In general you can't set up Ceph to write to the local node first. In

some specific cases you can if you're willing to do a lot more work

around placement, and this *might* be one of those cases.

To do this, you'd need to change the CRUSH rules pretty extensively,

so that instead of selecting OSDs at random, they have two steps:

1) starting from bucket s3, select a random OSD and put it at the

front of the OSD list for the PG.

2) Starting from a bucket which contains all the other OSDs, select

N-1 more at random (where N is the number of desired replicas).

I understand in principle what you're saying.  Let me go back a step and ask the question somewhat differently then:

I have set up 4 machines in a cluster.  When I created the Windows 2008 server VM on S1 (I corrected my first email: I have three Sunfire X series servers, S1, S2, S3) since S1 has 36GB of RAM en 8 x 300GB SAS drives, it was running normally, pretty close to what I had on the bare metal.  About a month later (after being on leave to 2 weeks), I found a machine that is crawling at a snail pace and I cannot figure out why.

So instead of suggesting something from my side (without in-depth knowledge yet), what should I do to get this machine to run at speed again?

Further to my hardware and network:  

S1: 2 x Quad Code Xeon, 36GB RAM, 8 x 300GB HDD's
S2: 1 x Opteron Dual Core, 8GB RAM, 2 x 750GB HDD's
S3: 1 x Opetron Dual Core, 8GB RAM, 2 x 750GB HDD's
H1: 1 x Xeon Dual Core, 5GB RAM, 12 x 1TB HDD's
(All these machines are at full drive capacity, that is all their slots are being utilised)

All the servers are linked with Dual Gigabit Ethernet connections to a switch with LACP enabled and the links are bonded on each server.  While this doesn't raise the total transfer speed, it does allow more bandwidth between the servers.

The H1 machine is only running ceph and thus acts only as storage.  The other machines (S1, S2 & S3) are for web servers (development and production), the Windows 2008 server and a few other functions all managed from proxmox.

The hardware is what my client has been using, but there were lots of inefficiencies and little redundancy in the setup before we embarked on this project.  However, the hardware is sufficient for their needs.

I hope that gives you a reasonable picture of the setup so be able to give me some advice on how to troubleshoot this.

regards

Roland

You can look at the documentation on CRUSH or search the list archives

for more on this subject.

Note that doing this has a bunch of down sides: you'll have balance

issues because every piece of data will be on the s3 node (that's a

TERRIBLE name for a project which has API support for Amazon S3, btw

:p), if you add new VMs on a different node they'll all be going to

the s3 node for all their writes (unless you set them up on a

different pool with different CRUSH rules), s3 will be satisfying all

the read requests so the other nodes are just backups in case of disk

failure, etc.

-Greg

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com