Re: Any suggestions on the best way to migrate / fix my cluster configuration

John Spray <john.spray@xxxxxxxxxx> · Fri, 13 Feb 2015 16:48:00 +0000

On Fri, Feb 13, 2015 at 1:11 AM, Carl J Taylor <cjtaylor@xxxxxxxxx> wrote:
> The setup was originally 3 machines each with 2 3TB disks using EXT-4 FS on
> the first disk and XFS on the second on each node.  One has 8GB Ram and the
> other two have 4GB RAM.

4GB is pretty light for a storage server.

> I decided to add a 4th node into the group and set that up with XFS on the
> primary disk and BTRFS on the second.  On this machine the Journal whilst on
> the same disk as the OSD is in a raw partition on each disk. The memory is
> 8GB.

Having three different backing filesystems in the mix is not going to
make your life simple.  For a simple life, use XFS on all your OSDs
(or at least use the same filesystem on all your OSDs).

> It took about 7 days for the data to finally move off the first node onto
> the fourth node and already bad performance became abysmal.

Backfills are never going to be fun on a small cluster: you don't have
enough drives in the system to spread the load over.  You can try
decreasing osd_max_backfills to mitigate the impact on performance.

Since you say the cluster was created several years ago, you may also
be running with older CRUSH tunables that might have led to some
unnecessary data movement after the new OSDs were added.

> I had planned to add a PCI based SSD as a main drive and use the two disks
> with journals on the SSD but I could not get the SSD to work at all.

You don't say what went wrong with your SSD.  You should probably
stick with trying to get your SSD working: solid state journals can be
a big win for write performance.  You would need to put one in each
server to get consistent performance.

I note in your "ceph status" output that you have deep scrubs going
on.  Set the noscrub and nodeep-scrub flags (and wait for any ongoing
scrubs to finish) before trying to get any consistent numbers.

Finally, assuming you're doing 3x replication, your entire cluster is
only giving you 6TB usable storage.  That's a sufficiently small
quantity of data that you might consider simply backing up, building a
fresh ceph cluster (preferably with symmetrically configured servers),
and restoring.

John
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com