Flapping OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

One of my ceph nodes have flapping OSDs, network between nodes are fine, it's on a 10GBase-T network. I don't see anything wrong with the network, but these OSDs are going up/down.

[root@avatar0-ceph4 ~]# ceph osd tree
# id    weight  type name       up/down reweight
-1      174.7   root default
-2      29.12           host avatar0-ceph2
16      3.64                    osd.16  up      1
17      3.64                    osd.17  up      1
18      3.64                    osd.18  up      1
19      3.64                    osd.19  up      1
20      3.64                    osd.20  up      1
21      3.64                    osd.21  up      1
22      3.64                    osd.22  up      1
23      3.64                    osd.23  up      1
-3      29.12           host avatar0-ceph0
0       3.64                    osd.0   up      1
1       3.64                    osd.1   up      1
2       3.64                    osd.2   up      1
3       3.64                    osd.3   up      1
4       3.64                    osd.4   up      1
5       3.64                    osd.5   up      1
6       3.64                    osd.6   up      1
7       3.64                    osd.7   up      1
-4      29.12           host avatar0-ceph1
8       3.64                    osd.8   up      1
9       3.64                    osd.9   up      1
10      3.64                    osd.10  up      1
11      3.64                    osd.11  up      1
12      3.64                    osd.12  up      1
13      3.64                    osd.13  up      1
14      3.64                    osd.14  up      1
15      3.64                    osd.15  up      1
-5      29.12           host avatar0-ceph3
24      3.64                    osd.24  up      1
25      3.64                    osd.25  up      1
26      3.64                    osd.26  up      1
27      3.64                    osd.27  up      1
28      3.64                    osd.28  up      1
29      3.64                    osd.29  up      1
30      3.64                    osd.30  up      1
31      3.64                    osd.31  up      1
-6      29.12           host avatar0-ceph4
32      3.64                    osd.32  up      1
33      3.64                    osd.33  up      1
34      3.64                    osd.34  up      1
35      3.64                    osd.35  up      1
36      3.64                    osd.36  up      1
37      3.64                    osd.37  up      1
38      3.64                    osd.38  up      1
39      3.64                    osd.39  up      1
-7      29.12           host avatar0-ceph5
40      3.64                    osd.40  up      1
41      3.64                    osd.41  up      1
42      3.64                    osd.42  up      1
43      3.64                    osd.43  up      1
44      3.64                    osd.44  up      1
45      3.64                    osd.45  up      1
46      3.64                    osd.46  up      1
47      3.64                    osd.47  up      1
[root@avatar0-ceph4 ~]#


Here is my ceph.conf
---
[root@avatar0-ceph4 ~]# cat /etc/ceph/ceph.conf
[global]
fsid = 2f0d1928-2ee5-4731-a259-64c0dc16110a
mon_initial_members = avatar0-ceph0, avatar0-ceph1, avatar0-ceph2
mon_host = 172.40.40.100,172.40.40.101,172.40.40.102
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2
osd_pool_default_min_size = 1
cluster_network = 172.50.50.0/24
public_network = 172.40.40.0/24
max_open_files = 131072
mon_clock_drift_allowed = .15
mon_clock_drift_warn_backoff = 30
mon_osd_down_out_interval = 300
mon_osd_report_timeout = 300
mon_osd_min_down_reporters = 3


[osd]
filestore_merge_threshold = 40
filestore_split_multiple = 8
osd_op_threads = 8
osd_max_backfills = 1
osd_recovery_op_priority = 1
osd_recovery_max_active = 1

[client]
rbd_cache = true
rbd_cache_writethrough_until_flush = true
---

Here's the log snippet on osd.34
---
2017-04-02 22:26:10.371282 7f1064eab700  0 -- 172.50.50.105:6816/117130897 >> 172.50.50.101:6808/190698 pipe(0x156a1b80 sd=124 :46536 s=2 pgs=966 cs=1 l=0 c=0x13ae19c0).fault with nothing to send, going to standby
2017-04-02 22:26:10.371360 7f106ed5c700  0 -- 172.50.50.105:6816/117130897 >> 172.50.50.104:6822/181109 pipe(0x1018c2c0 sd=75 :34196 s=2 pgs=1022 cs=1 l=0 c=0x1098fa20).fault with nothing to send, going to standby
2017-04-02 22:26:10.371393 7f1067ad2700  0 -- 172.50.50.105:6816/117130897 >> 172.50.50.103:6806/118813 pipe(0x166b5c80 sd=34 :34156 s=2 pgs=1041 cs=1 l=0 c=0x10c4bfa0).fault with nothing to send, going to standby
2017-04-02 22:26:10.371739 7f107137b700  0 -- 172.50.50.105:6816/117130897 >> 172.50.50.103:6815/121286 pipe(0xd7eb8c0 sd=192 :43966 s=2 pgs=1042 cs=1 l=0 c=0x12eb7e40).fault with nothing to send, going to standby
2017-04-02 22:26:10.375016 7f1068ff7700  0 -- 172.50.50.105:6816/117130897 >> 172.50.50.104:6812/183825 pipe(0x10f70c00 sd=61 :34442 s=2 pgs=1025 cs=1 l=0 c=0xb10be40).fault with nothing to send, going to standby
2017-04-02 22:26:10.375221 7f107157d700  0 -- 172.50.50.105:6816/117130897 >> 172.50.50.102:6806/66401 pipe(0x10ba78c0 sd=191 :46312 s=2 pgs=988 cs=1 l=0 c=0x6f8c420).fault with nothing to send, going to standby
2017-04-02 22:26:11.041747 7f10885ab700  0 log_channel(default) log [WRN] : map e85725 wrongly marked me down
2017-04-02 22:26:16.427858 7f1062892700  0 -- 172.50.50.105:6807/118130897 >> 172.50.50.105:6811/116133701 pipe(0xd4cb180 sd=257 :6807 s=0 pgs=0 cs=0 l=0 c=0x13ae07e0).accept connect_seq 0 vs existing 0 state connecting
2017-04-02 22:26:16.427897 7f1062993700  0 -- 172.50.50.105:6807/118130897 >> 172.50.50.105:6811/116133701 pipe(0xfb50680 sd=76 :56374 s=4 pgs=0 cs=0 l=0 c=0x174255a0).connect got RESETSESSION but no longer connecting
---




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux