Hi all,
I'm using Ceph as a filestore for my nginx web server, in order to have
shared storage, and redundancy with automatic failover.
The cluster is not high spec, but given my use case (lots of images) - I
am very dissapointed with the current throughput I'm getting, and was
hoping for some advice.
I'm using CephFS and the latest Dumpling version on Ubuntu Server 12.04
Server specs:
CephFS1, CephFS2:
Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz
12GB Ram
1x 2TB SATA XFS
1x 2TB SATA (For the journal)
Each server runs 1x OSD, 1x MON and 1x MDS.
A third server runs 1x MON for Paxos to work correctly.
All machines are connected via a gigabit switch.
The ceph config as follows:
[global]
fsid = 58b87152-5ce8-491e-ae9c-07caeea3fefb
mon_initial_members = lb1, cephfs1, cephfs2
mon_host = 192.168.1.58,192.168.1.70,192.168.1.72
auth_supported = cephx
osd_journal_size = 1024
filestore_xattr_use_omap = true
Osd dump:
epoch 750
fsid 58b87152-5ce8-491e-ae9c-07caeea3fefb
created 2013-09-12 13:13:02.695411
modified 2013-10-21 14:28:31.780838
flags
pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 1 owner 0
max_osd 4
osd.0 up in weight 1 up_from 741 up_thru 748 down_at 739
last_clean_interval [614,738) 192.168.1.70:6802/12325
192.168.1.70:6803/12325 192.168.1.70:6804/12325 192.168.1.70:6805/12325
exists,up d59119d5-bccb-43ea-be64-9d2272605617
osd.1 up in weight 1 up_from 748 up_thru 748 down_at 745
last_clean_interval [20,744) 192.168.1.72:6800/4271
192.168.1.72:6801/4271 192.168.1.72:6802/4271 192.168.1.72:6803/4271
exists,up 930c097a-f68b-4f9c-a6a1-6787a1382a41
pg_temp 0.12 [1,0,3]
pg_temp 0.16 [1,0,3]
pg_temp 0.18 [1,0,3]
pg_temp 1.11 [1,0,3]
pg_temp 1.15 [1,0,3]
pg_temp 1.17 [1,0,3]
Slowdowns increase the load of my nginx servers to around 40, and access
to the CephFS mount is incredibly slow. These slowdowns happen about
once a week. I typically solve them by restarting the MDS.
When the cluster gets slow I see the following in my logs:
2013-10-21 14:33:54.079200 7f6301e10700 0 log [WRN] : slow request
30.281651 seconds old, received at 2013-10-21 14:33:23.797488:
osd_op(mds.0.8:16266 100004094c4.00000000 [tmapup 0~0] 1.91102783 e750)
v4 currently commit sent
013-10-21 14:33:54.079191 7f6301e10700 0 log [WRN] : 6 slow requests, 6
included below; oldest blocked for > 30.281651 secs
Any advice? Would increasing the PG num for data and metadata help?
Would moving the MDS to a host which does not also run an OSD be greatly
beneficial?
Please let me know if you need more info.
Thank you,
Pieter
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com