Thanks
We play with the values a bit and see what happens.
Br,
Tuomas
From: Quentin Hartman [mailto:qhartman@xxxxxxxxxxxxxxxxxxx]
Sent: 7. elokuuta 2015 20:32
To: Tuomas Juntunen
Cc: ceph-users
Subject: Re: Flapping OSD's when scrubbing_______________________________________________
That kind of behavior is usually caused by the OSDs getting busy enough that they aren't answering heartbeats in a timely fashion. It can also happen if you have any netowrk flakiness and heartbeats are getting lost because of that.
I think (I'm not positive though) that increasing your heartbeat interval may help. Also, looking at the number of threads you have for your OSDs, that seems potentially problematic. If you've got 24 OSDs per machine and each one is running 12 threads, that's 288 threads on 12 cores for just the requests. Plus the disk threads, plus the filestore op threads... That level of thread contention seems like it might be contributing to missing the heartbeats. But again, that's conjecture. I've not worked with a setup as dense as yours.
QH
On Fri, Aug 7, 2015 at 11:21 AM, Tuomas Juntunen <tuomas.juntunen@xxxxxxxxxxxxxxx> wrote:
Hi
We are experiencing an annoying problem where scrubs make OSD’s flap down and cause Ceph cluster to be unusable for couple of minutes.
Our cluster consists of three nodes connected with 40gbit infiniband using IPoIB, with 2x 6 core X5670 CPU’s and 64GB of memory
Each node has 6 SSD’s for journals to 12 OSD’s 2TB disks (Fast pools) and another 12 OSD’s 4TB disks (Archive pools) which have journal on the same disk.
It seems that our cluster is constantly doing scrubbing, we rarely see only active+clean, below is the status at the moment.
cluster a2974742-3805-4cd3-bc79-765f2bddaefe
health HEALTH_OK
monmap e16: 4 mons at {lb1=10.20.60.1:6789/0,lb2=10.20.60.2:6789/0,nc1=10.20.50.2:6789/0,nc2=10.20.50.3:6789/0}
election epoch 1838, quorum 0,1,2,3 nc1,nc2,lb1,lb2
mdsmap e7901: 1/1/1 up {0=lb1=up:active}, 4 up:standby
osdmap e104824: 72 osds: 72 up, 72 in
pgmap v12941402: 5248 pgs, 9 pools, 19644 GB data, 4810 kobjects
59067 GB used, 138 TB / 196 TB avail
5241 active+clean
7 active+clean+scrubbing
When OSD’s go down, first the load on a node goes high during scrubbing and after that some OSD’s go down and 30 secs, they are back up. They are not really going down, but are marked as down. Then it takes around couple of minutes for everything be OK again.
Any suggestion how to fix this? We can’t go to production while this behavior exists.
Our config is below:
[global]
fsid = a2974742-3805-4cd3-bc79-765f2bddaefe
mon_initial_members = lb1,lb2,nc1,nc2
mon_host = 10.20.60.1,10.20.60.2,10.20.50.2,10.20.50.3
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd pool default pg num = 128
osd pool default pgp num = 128
public network = 10.20.0.0/16
osd_op_threads = 12
osd_op_num_threads_per_shard = 2
osd_op_num_shards = 6
#osd_op_num_sharded_pool_threads = 25
filestore_op_threads = 12
ms_nocrc = true
filestore_fd_cache_size = 64
filestore_fd_cache_shards = 32
ms_dispatch_throttle_bytes = 0
throttler_perf_counter = false
mon osd min down reporters = 25
[osd]
osd scrub max interval = 1209600
osd scrub min interval = 604800
osd scrub load threshold = 3.0
osd max backfills = 1
osd recovery max active = 1
# IO Scheduler settings
osd scrub sleep = 1.0
osd disk thread ioprio class = idle
osd disk thread ioprio priority = 7
osd scrub chunk max = 1
osd scrub chunk min = 1
osd deep scrub stride = 1048576
filestore queue max ops = 10000
filestore max sync interval = 30
filestore min sync interval = 29
osd deep scrub interval = 2592000
osd heartbeat grace = 240
osd heartbeat interval = 12
osd mon report interval max = 120
osd mon report interval min = 5
osd_client_message_size_cap = 0
osd_client_message_cap = 0
osd_enable_op_tracker = false
osd crush update on start = false
[client]
rbd cache = true
rbd cache size = 67108864 # 64mb
rbd cache max dirty = 50331648 # 48mb
rbd cache target dirty = 33554432 # 32mb
rbd cache writethrough until flush = true # It's by default
rbd cache max dirty age = 2
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
Br,
Tuomas
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com