I’ve checked the network, we use IPoIB and all nodes are connected to the same switch, there are no breaks in connectivity while this happens. My constant ping says 0.03 – 0.1ms. I would say this is ok. This happens almost every time when deep scrubbing is running. Our loads on this particular server goes to 300+ and osd’s are marked down. Any suggestions on settings? I now have the following settings that might affect this [global] osd_op_threads = 6 osd_op_num_threads_per_shard = 1 osd_op_num_shards = 25 #osd_op_num_sharded_pool_threads = 25 filestore_op_threads = 6 ms_nocrc = true filestore_fd_cache_size = 64 filestore_fd_cache_shards = 32 ms_dispatch_throttle_bytes = 0 throttler_perf_counter = false [osd] osd scrub load threshold = 0.1 osd max backfills = 1 osd recovery max active = 1 osd scrub sleep = .1 osd disk thread ioprio class = idle osd disk thread ioprio priority = 7 osd scrub chunk max = 5 osd deep scrub stride = 1048576 filestore queue max ops = 10000 filestore max sync interval = 30 filestore min sync interval = 29 osd_client_message_size_cap = 0 osd_client_message_cap = 0 osd_enable_op_tracker = false Br, T From: Somnath Roy [mailto:Somnath.Roy@xxxxxxxxxxx] This can happen if your OSDs are flapping.. Hope your network is stable. Thanks & Regards Somnath From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Tuomas Juntunen Hi One our nodes has OSD logs that say “wrongly marked me down” for every OSD at some point. What could be the reason for this. Anyone have any similar experiences? Other nodes work totally fine and they are all identical. Br,T
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com