Re: Intermittent poor performance on 3 node cluster

Pieter Steyn <pieter@xxxxxxxxxx> · Tue, 22 Oct 2013 08:30:13 +0200

On 21/10/2013 22:45, Gregory Farnum wrote:
On Mon, Oct 21, 2013 at 8:05 AM, Pieter Steyn <pieter@xxxxxxxxxx> wrote:
Hi all,

I'm using Ceph as a filestore for my nginx web server, in order to have
shared storage, and redundancy with automatic failover.

The cluster is not high spec, but given my use case (lots of images) - I am
very dissapointed with the current throughput I'm getting, and was hoping
for some advice.

I'm using CephFS and the latest Dumpling version on Ubuntu Server 12.04

Server specs:

CephFS1, CephFS2:

Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz
12GB Ram
1x 2TB SATA XFS
1x 2TB SATA (For the journal)

Each server runs 1x OSD, 1x MON and 1x MDS.
A third server runs 1x MON for Paxos to work correctly.
All machines are connected via a gigabit switch.

The ceph config as follows:

[global]
fsid = 58b87152-5ce8-491e-ae9c-07caeea3fefb
mon_initial_members = lb1, cephfs1, cephfs2
mon_host = 192.168.1.58,192.168.1.70,192.168.1.72
auth_supported = cephx
osd_journal_size = 1024
filestore_xattr_use_omap = true

Osd dump:

epoch 750
fsid 58b87152-5ce8-491e-ae9c-07caeea3fefb
created 2013-09-12 13:13:02.695411
modified 2013-10-21 14:28:31.780838
flags

pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0

max_osd 4
osd.0 up   in  weight 1 up_from 741 up_thru 748 down_at 739
last_clean_interval [614,738) 192.168.1.70:6802/12325
192.168.1.70:6803/12325 192.168.1.70:6804/12325 192.168.1.70:6805/12325
exists,up d59119d5-bccb-43ea-be64-9d2272605617
osd.1 up   in  weight 1 up_from 748 up_thru 748 down_at 745
last_clean_interval [20,744) 192.168.1.72:6800/4271 192.168.1.72:6801/4271
192.168.1.72:6802/4271 192.168.1.72:6803/4271 exists,up
930c097a-f68b-4f9c-a6a1-6787a1382a41

pg_temp 0.12 [1,0,3]
pg_temp 0.16 [1,0,3]
pg_temp 0.18 [1,0,3]
pg_temp 1.11 [1,0,3]
pg_temp 1.15 [1,0,3]
pg_temp 1.17 [1,0,3]

Slowdowns increase the load of my nginx servers to around 40, and access to
the CephFS mount is incredibly slow.  These slowdowns happen about once a
week.  I typically solve them by restarting the MDS.

When the cluster gets slow I see the following in my logs:

2013-10-21 14:33:54.079200 7f6301e10700  0 log [WRN] : slow request
30.281651 seconds old, received at 2013-10-21 14:33:23.797488:
osd_op(mds.0.8:16266 100004094c4.00000000 [tmapup 0~0] 1.91102783 e750) v4
currently commit sent
013-10-21 14:33:54.079191 7f6301e10700  0 log [WRN] : 6 slow requests, 6
included below; oldest blocked for > 30.281651 secs
If this is the sole kind of slow request you see (tmapup reports),
then it looks like the MDS is flushing out directory updates and the
OSD is taking a long time to process them. I'm betting you have very
large directories and it's taking the OSD a while to process the
changes; and the MDS is getting backed up while it does so because
it's trying to flush them out of memory.

Any advice? Would increasing the PG num for data and metadata help? Would
moving the MDS to a host which does not also run an OSD be greatly
beneficial?
Your PG counts are probably fine for a cluster of that size, although
you could try bumping them up by 2x or something. More likely, though,
is that your CephFS install is not well-tuned for the directory sizes
you're using. What's the largest directory you're using? Have you
tried bumping up your mds cache size? (And what's the host memory
usage look like?)

I have lots of directories named like 2013_08 averaging about 50GB each, 
just filled with images.
We haven't tuned the mds cache size at all, and memory usage on the MDS 
server is generally very high.

Thank you, this seems to be a good starting point, and makes sense given 
our use case.

Kind regards,
Pieter Steyn
 -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com