Re: Poor performance with three nodes

Eric Lee Green <eric.lee.green@xxxxxxxxx> · Wed, 02 Oct 2013 15:16:16 -0700

On 10/2/2013 2:24 PM, Gregory Farnum wrote:
There's a couple things here:
1) You aren't accounting for Ceph's journaling. Unlike a system such
as NFS, Ceph provides *very* strong data integrity guarantees under
failure conditions, and in order to do so it does full data
journaling. So, yes, cut your total disk bandwidth in half. (There's
also a lot of syncing which it manages carefully to reduce the cost,
but if you had other writes happening via your NFS/iSCSI setups that
might have been hit by the OSD running a sync on its disk, that could
be dramatically impacting the perceived throughput.)

I was running iostat on the storage servers to see what was happening 
during the quicky test, and was not seeing large amounts of I/O taking 
place, whether created by Ceph or not. From what you are saying I should 
have seen roughly 120 megabytes per second being written on at least one 
of the servers as the journal hit the server. I was seeing roughly 30 
megabytes per second on each server.

2) Placing an OSD (with its journal) on a RAID-6 is about the worst
thing you can do for Ceph's performance; it does a lot of small
flushed-to-disk IOs in the journal in between the full data writes.
Try some other configuration?

I don't have any other configuration. These servers are in production. 
They cannot be taken down. I don't have any more servers with 10 gigabit 
Ethernet cards (which are *not* cheap).  I'm starting to suspect that 
Ceph simply is not usable in my environment, which is a mixed-mode 
shared environment rather than something that can be dedicated to any 
single storage protocol. Toob ad.

3) Did you explicitly set your PG counts at any point? They default to
8, which is entirely too low; given your setup you should have
400-1000 per pool.

They defaulted to 64, and according to the calculation in the 
documentation should be 500/3 = 166 per pool for my configuration. 
Still, that does not appear to be the issue here considering that I've 
created one block device and that's the only Ceph traffic.  I just 
raised it to 166 for the data pool, no difference.

4) There could have been something wrong/going on with the system;
though I doubt it. But if you can provide the output of "ceph -s"
that'll let us check the basics.

Everything looks healthy.

[root@stack1 ~]# ceph -s
  cluster 26206dba-e976-4217-a3d4-c9ea02c188be
   health HEALTH_OK
   monmap e2: 3 mons at 
{stack1=10.200.0.3:6789/0,storage1=10.200.0.1:6789/0,storage2=10.200.0.2:6789/0}, 
election epoch 112, quorum 0,1,2 stack1,storage1,storage2
   osdmap e793: 5 osds: 5 up, 5 in
    pgmap v3504: 295 pgs: 295 active+clean; 21264 MB data, 30154 MB 
used, 10205 GB / 10235 GB avail
   mdsmap e25: 1/1/1 up {0=stack1=up:active}

Separately, if all you want is to ensure that data resides on at least
two servers, there are better ways than saying "each server has two
daemons, so I'll do 3-copy". See eg

I was as concerned about performance as I was about redundancy when I 
set it to three copies. I saw the crush map rules but for my purposes 
simply having three replicas was sufficient.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com