Re: Monitors repeatedly calling for new elections

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Apologies for replying to myself, I thought I'd add a bit more information.  We don't think the ceph cluster is the issue, but maybe something on the clients (bad configuration setting?  Bug in our older version of ceph-client?).  I've attached our CrushMap and OSD tree, as well.  Neither /var/log/messages nor dmesg show anything interesting.

Below is typical of what I'll see.  This doesn't happen every time the system is active, but if I perform several tests during the day I will certainly see this.  During the tests, we'll also often find service times pretty high:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    46.50    0.00   17.20     0.00   509.60    29.63     1.74  101.19   1.33   2.28
rbd2              0.00     0.00    0.00    0.40     0.00    37.40    93.50     1.01   36.00 2500.00 100.00  <-----
rbd3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     1.00    0.00   0.00 100.00
rbd5              0.00     0.00    0.00    0.20     0.00     5.00    25.00     0.00   18.00  18.00   0.36
rbd6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     2.00    0.00   0.00 100.00
rbd7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     2.00    0.00   0.00 100.00
rbd8              0.00     0.00    0.00    0.10     0.00     5.00    50.00     0.00   20.00  20.00   0.20
rbd10             0.00     0.00    0.00    0.00     0.00     0.00     0.00     1.00    0.00   0.00 100.00
rbd12             0.00     0.00    0.00    0.00     0.00     0.00     0.00     1.00    0.00   0.00 100.00
rbd13             0.00     0.00    0.70    0.70    41.10    18.20    42.36     0.03   24.86  24.86   3.48
rbd14             0.00     0.00    0.00    0.50     0.00    30.60    61.20     1.02   48.00 2000.00 100.00  <-----
rbd16             0.00     0.00    0.00    0.00     0.00     0.00     0.00     1.00    0.00   0.00 100.00
rbd18             0.00     0.00    0.40    1.10     0.80    41.20    28.00     1.04   29.33 666.67 100.00
rbd19             0.00     0.00    0.00    0.00     0.00     0.00     0.00     1.00    0.00   0.00 100.00
rbd21             0.00     0.00    0.40    0.80     2.00    28.80    25.67     2.04   31.67 833.33 100.00
rbd22             0.00     0.00    0.00    0.60     0.00    30.60    51.00     0.02   32.67  32.67   1.96
rbd23             0.00     0.00    0.00    0.00     0.00     0.00     0.00     1.00    0.00   0.00 100.00
rbd24             0.00     0.00    0.00    0.00     0.00     0.00     0.00     1.00    0.00   0.00 100.00

And from 'ceph -w' around that same time:

______________________2014-12-09 11:43:07.446952 osd.14 [WRN] 22 slow requests, 4 included below; oldest blocked for > 43.561110 secs                                                                                                      
2014-12-09 11:43:07.446958 osd.14 [WRN] slow request 30.322788 seconds old, received at 2014-12-09 11:42:37.124101: osd_op(client.8232.1:18380981 rb.0.195f.238e1f29.000000000d62 [write 0~12288] 3.c7335bdd ondisk+write e499) v4 currently waiting for subops from 0,33                                                                                                                                                                 
2014-12-09 11:43:07.446966 osd.14 [WRN] slow request 30.321805 seconds old, received at 2014-12-09 11:42:37.125084: osd_op(client.8232.1:18380985 rb.0.195f.238e1f29.000000000d62 [write 12288~507904] 3.c7335bdd ondisk+write e499) v4 currently waiting for subops from 0,33                                                                                                                                                            
2014-12-09 11:43:07.446972 osd.14 [WRN] slow request 30.321003 seconds old, received at 2014-12-09 11:42:37.125886: osd_op(client.8232.1:18380986 rb.0.195f.238e1f29.000000000d62 [write 520192~507904] 3.c7335bdd ondisk+write e499) v4 currently waiting for subops from 0,33                                                                                                                                                           
2014-12-09 11:43:07.446977 osd.14 [WRN] slow request 30.320158 seconds old, received at 2014-12-09 11:42:37.126731: osd_op(client.8232.1:18380987 rb.0.195f.238e1f29.000000000d62 [write 1028096~507904] 3.c7335bdd ondisk+write e499) v4 currently waiting for subops from 0,33                                                                                                                                                          
2014-12-09 11:43:08.447263 osd.14 [WRN] 26 slow requests, 4 included below; oldest blocked for > 44.561424 secs                                                                                                      
2014-12-09 11:43:08.447271 osd.14 [WRN] slow request 30.995124 seconds old, received at 2014-12-09 11:42:37.452079: osd_op(client.8232.1:18381339 rb.0.195f.238e1f29.000000000d62 [write 1536000~507904] 3.c7335bdd ondisk+write e499) v4 currently waiting for subops from 0,33                                                                                                                                                          
2014-12-09 11:43:08.447276 osd.14 [WRN] slow request 30.994609 seconds old, received at 2014-12-09 11:42:37.452594: osd_op(client.8232.1:18381340 rb.0.195f.238e1f29.000000000d62 [write 2043904~507904] 3.c7335bdd ondisk+write e499) v4 currently waiting for subops from 0,33                                                                                                                                                          
2014-12-09 11:43:08.447281 osd.14 [WRN] slow request 30.993274 seconds old, received at 2014-12-09 11:42:37.453929: osd_op(client.8232.1:18381342 rb.0.195f.238e1f29.000000000d62 [write 2551808~507904] 3.c7335bdd ondisk+write e499) v4 currently waiting for subops from 0,33                                                                                                                                                          
2014-12-09 11:43:08.447286 osd.14 [WRN] slow request 30.878378 seconds old, received at 2014-12-09 11:42:37.568825: osd_op(client.8232.1:18381520 rb.0.195f.238e1f29.000000000d62 [write 3059712~507904] 3.c7335bdd ondisk+write e499) v4 currently waiting for subops from 0,33                                                                                                                                                          
2014-12-09 11:43:09.447485 osd.14 [WRN] 28 slow requests, 2 included below; oldest blocked for > 45.561633 secs
2014-12-09 11:43:09.447491 osd.14 [WRN] slow request 30.963411 seconds old, received at 2014-12-09 11:42:38.484001: osd_op(client.8232.1:18382150 rb.0.195f.238e1f29.000000000d62 [write 3567616~507904] 3.c7335bdd ondisk+write e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:09.447498 osd.14 [WRN] slow request 30.855218 seconds old, received at 2014-12-09 11:42:38.592194: osd_op(client.8232.1:18382263 rb.0.195f.238e1f29.000000000d62 [write 4075520~118784] 3.c7335bdd ondisk+write e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:10.447678 osd.14 [WRN] 34 slow requests, 6 included below; oldest blocked for > 46.561824 secs
2014-12-09 11:43:10.447685 osd.14 [WRN] slow request 30.379322 seconds old, received at 2014-12-09 11:42:40.068281: osd_op(client.8232.1:18383364 rb.0.196e.238e1f29.000000000d67 [write 0~290816] 3.9208bbf0 ondisk+write e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:10.447691 osd.14 [WRN] slow request 30.319496 seconds old, received at 2014-12-09 11:42:40.128107: osd_op(client.8232.1:18383434 rb.0.196e.238e1f29.000000000d67 [write 290816~507904] 3.9208bbf0 ondisk+write e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:10.447696 osd.14 [WRN] slow request 30.295299 seconds old, received at 2014-12-09 11:42:40.152304: osd_op(client.8232.1:18383463 rb.0.196e.238e1f29.000000000d67 [write 798720~507904] 3.9208bbf0 ondisk+write e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:10.447701 osd.14 [WRN] slow request 30.294923 seconds old, received at 2014-12-09 11:42:40.152680: osd_op(client.8232.1:18383464 rb.0.196e.238e1f29.000000000d67 [write 1306624~507904] 3.9208bbf0 ondisk+write e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:10.447707 osd.14 [WRN] slow request 30.291513 seconds old, received at 2014-12-09 11:42:40.156090: osd_op(client.8232.1:18383471 rb.0.196e.238e1f29.000000000d67 [write 1814528~507904] 3.9208bbf0 ondisk+write e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:11.447889 osd.14 [WRN] 39 slow requests, 6 included below; oldest blocked for > 47.562034 secs
2014-12-09 11:43:11.447897 osd.14 [WRN] slow request 31.226020 seconds old, received at 2014-12-09 11:42:40.221793: osd_op(client.8232.1:18383506 rb.0.196e.238e1f29.000000000d67 [write 2322432~507904] 3.9208bbf0 ondisk+write e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:11.447901 osd.14 [WRN] slow request 31.191602 seconds old, received at 2014-12-09 11:42:40.256211: osd_op(client.8232.1:18383522 rb.0.196e.238e1f29.000000000d67 [write 2830336~507904] 3.9208bbf0 ondisk+write e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:11.447906 osd.14 [WRN] slow request 30.369915 seconds old, received at 2014-12-09 11:42:41.077898: osd_op(client.8229.1:43030061 rb.0.1917.238e1f29.000000000d6e [write 0~307200] 3.785054cc ondisk+write e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:11.447910 osd.14 [WRN] slow request 30.369655 seconds old, received at 2014-12-09 11:42:41.078158: osd_op(client.8229.1:43030063 rb.0.1917.238e1f29.000000000d6e [write 307200~507904] 3.785054cc ondisk+write e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:11.447916 osd.14 [WRN] slow request 30.369362 seconds old, received at 2014-12-09 11:42:41.078451: osd_op(client.8229.1:43030064 rb.0.1917.238e1f29.000000000d6e [write 815104~507904] 3.785054cc ondisk+write e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:12.448075 osd.14 [WRN] 44 slow requests, 6 included below; oldest blocked for > 48.562228 secs
2014-12-09 11:43:12.448082 osd.14 [WRN] slow request 31.369335 seconds old, received at 2014-12-09 11:42:41.078672: osd_op(client.8229.1:43030065 rb.0.1917.238e1f29.000000000d6e [write 1323008~507904] 3.785054cc ondisk+write e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:12.448087 osd.14 [WRN] slow request 31.358194 seconds old, received at 2014-12-09 11:42:41.089813: osd_op(client.8229.1:43030097 rb.0.1917.238e1f29.000000000d6e [write 1830912~507904] 3.785054cc ondisk+write e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:12.448093 osd.14 [WRN] slow request 31.357607 seconds old, received at 2014-12-09 11:42:41.090400: osd_op(client.8229.1:43030098 rb.0.1917.238e1f29.000000000d6e [write 2338816~507904] 3.785054cc ondisk+write e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:12.448097 osd.14 [WRN] slow request 31.357285 seconds old, received at 2014-12-09 11:42:41.090722: osd_op(client.8229.1:43030099 rb.0.1917.238e1f29.000000000d6e [write 2846720~507904] 3.785054cc ondisk+write e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:12.448102 osd.14 [WRN] slow request 31.295790 seconds old, received at 2014-12-09 11:42:41.152217: osd_op(client.8232.1:18384206 rb.0.196e.238e1f29.000000000d67 [write 3338240~507904] 3.9208bbf0 ondisk+write e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:13.448356 osd.14 [WRN] 46 slow requests, 3 included below; oldest blocked for > 49.562511 secs
2014-12-09 11:43:13.448364 osd.14 [WRN] slow request 32.188093 seconds old, received at 2014-12-09 11:42:41.260197: osd_op(client.8232.1:18384295 rb.0.196e.238e1f29.000000000d67 [write 3846144~348160] 3.9208bbf0 ondisk+write e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:13.448369 osd.14 [WRN] slow request 32.138680 seconds old, received at 2014-12-09 11:42:41.309610: osd_op(client.8229.1:43030315 rb.0.1917.238e1f29.000000000d6e [write 3354624~507904] 3.785054cc ondisk+write e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:13.448374 osd.14 [WRN] slow request 32.090464 seconds old, received at 2014-12-09 11:42:41.357826: osd_op(client.8229.1:43030374 rb.0.1917.238e1f29.000000000d6e [write 3862528~331776] 3.785054cc ondisk+write e499) v4 currently waiting for subops from 0,33





__________________
From: Sanders, Bill
Sent: Tuesday, December 09, 2014 9:19 AM
To: Rodrigo Severo
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: RE:  Monitors repeatedly calling for new elections

Thanks for the response.  I did forget to mention that NTP is setup and does appear to be running (just double checked).

Is this good enough resolution?

$ for node in $nodes; do ssh tvsa${node} sudo date --rfc-3339=ns; done
2014-12-09 09:15:39.404292557-08:00
2014-12-09 09:15:39.521762397-08:00
2014-12-09 09:15:39.641491188-08:00
2014-12-09 09:15:39.761937524-08:00
2014-12-09 09:15:39.911416676-08:00
2014-12-09 09:15:40.029777457-08:00

Bill
________________________________________
From: Rodrigo Severo [rodrigo@xxxxxxxxxxxxxxxxxxx]
Sent: Tuesday, December 09, 2014 4:02 AM
To: Sanders, Bill
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Monitors repeatedly calling for new elections

On Mon, Dec 8, 2014 at 5:23 PM, Sanders, Bill <Bill.Sanders@xxxxxxxxxxxx> wrote:

> Under activity, we'll get monitors going into election cycles repeatedly,
> OSD's being "wrongly marked down", as well as slow requests "osd.11
> 39.7.48.6:6833/21938 failed (3 reports from 1 peers after 52.914693 >= grace
> 20.000000)" .  During this, ceph -w shows the cluster essentially idle.
> None of the network, disks, or cpu's ever appear to max out.  It also
> doesn't appear to be the same OSD's, MON's, or node causing the problem.
> Top reports all 128 GB RAM (negligible swap) in use on the storage nodes.
> Only Ceph is running on the storage nodes.


I'm really new to Ceph but my first bet is that your computers aren't
clock synchronized. Are all of them with working ntpds?


Regards,

Rodrigo Severo
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20
device 21 osd.21
device 22 osd.22
device 23 osd.23
device 24 osd.24
device 25 osd.25
device 26 osd.26
device 27 osd.27
device 28 osd.28
device 29 osd.29
device 30 osd.30
device 31 osd.31
device 32 osd.32
device 33 osd.33
device 34 osd.34
device 35 osd.35
device 36 osd.36
device 37 osd.37
device 38 osd.38
device 39 osd.39
device 40 osd.40
device 41 osd.41
device 42 osd.42
device 43 osd.43
device 44 osd.44
device 45 osd.45
device 46 osd.46
device 47 osd.47

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host tvsaq1 {
	id -2		# do not change unnecessarily
	# weight 41.400
	alg straw
	hash 0	# rjenkins1
	item osd.0 weight 3.450
	item osd.1 weight 3.450
	item osd.2 weight 3.450
	item osd.3 weight 3.450
	item osd.4 weight 3.450
	item osd.5 weight 3.450
	item osd.6 weight 3.450
	item osd.7 weight 3.450
	item osd.8 weight 3.450
	item osd.9 weight 3.450
	item osd.10 weight 3.450
	item osd.11 weight 3.450
}
host tvsaq2 {
	id -3		# do not change unnecessarily
	# weight 41.400
	alg straw
	hash 0	# rjenkins1
	item osd.12 weight 3.450
	item osd.13 weight 3.450
	item osd.14 weight 3.450
	item osd.15 weight 3.450
	item osd.16 weight 3.450
	item osd.17 weight 3.450
	item osd.18 weight 3.450
	item osd.19 weight 3.450
	item osd.20 weight 3.450
	item osd.21 weight 3.450
	item osd.22 weight 3.450
	item osd.23 weight 3.450
}
host tvsar1 {
	id -4		# do not change unnecessarily
	# weight 41.400
	alg straw
	hash 0	# rjenkins1
	item osd.24 weight 3.450
	item osd.25 weight 3.450
	item osd.26 weight 3.450
	item osd.27 weight 3.450
	item osd.28 weight 3.450
	item osd.29 weight 3.450
	item osd.30 weight 3.450
	item osd.31 weight 3.450
	item osd.32 weight 3.450
	item osd.33 weight 3.450
	item osd.34 weight 3.450
	item osd.35 weight 3.450
}
host tvsar2 {
	id -5		# do not change unnecessarily
	# weight 41.400
	alg straw
	hash 0	# rjenkins1
	item osd.36 weight 3.450
	item osd.37 weight 3.450
	item osd.38 weight 3.450
	item osd.39 weight 3.450
	item osd.40 weight 3.450
	item osd.41 weight 3.450
	item osd.42 weight 3.450
	item osd.43 weight 3.450
	item osd.44 weight 3.450
	item osd.45 weight 3.450
	item osd.46 weight 3.450
	item osd.47 weight 3.450
}
root default {
	id -1		# do not change unnecessarily
	# weight 165.600
	alg straw
	hash 0	# rjenkins1
	item tvsaq1 weight 41.400
	item tvsaq2 weight 41.400
	item tvsar1 weight 41.400
	item tvsar2 weight 41.400
}

# rules
rule replicated_ruleset {
	ruleset 0
	type replicated
	min_size 1
	max_size 10
	step take default
	step chooseleaf firstn 0 type host
	step emit
}

# end crush map
# id    weight  type name       up/down reweight
-1      165.6   root default
-2      41.4            host tvsaq1
0       3.45                    osd.0   up      1
1       3.45                    osd.1   up      1
2       3.45                    osd.2   up      1
3       3.45                    osd.3   up      1
4       3.45                    osd.4   up      1
5       3.45                    osd.5   up      1
6       3.45                    osd.6   up      1
7       3.45                    osd.7   up      1
8       3.45                    osd.8   up      1
9       3.45                    osd.9   up      1
10      3.45                    osd.10  up      1
11      3.45                    osd.11  up      1
-3      41.4            host tvsaq2
12      3.45                    osd.12  up      1
13      3.45                    osd.13  up      1
14      3.45                    osd.14  up      1
15      3.45                    osd.15  up      1
16      3.45                    osd.16  up      1
17      3.45                    osd.17  up      1
18      3.45                    osd.18  up      1
19      3.45                    osd.19  up      1
20      3.45                    osd.20  up      1
21      3.45                    osd.21  up      1
22      3.45                    osd.22  up      1
23      3.45                    osd.23  up      1
-4      41.4            host tvsar1
24      3.45                    osd.24  up      1
25      3.45                    osd.25  up      1
26      3.45                    osd.26  up      1
27      3.45                    osd.27  up      1
28      3.45                    osd.28  up      1
29      3.45                    osd.29  up      1
30      3.45                    osd.30  up      1
31      3.45                    osd.31  up      1
32      3.45                    osd.32  up      1
33      3.45                    osd.33  up      1
34      3.45                    osd.34  up      1
35      3.45                    osd.35  up      1
-5      41.4            host tvsar2
36      3.45                    osd.36  up      1
37      3.45                    osd.37  up      1
38      3.45                    osd.38  up      1
39      3.45                    osd.39  up      1
40      3.45                    osd.40  up      1
41      3.45                    osd.41  up      1
42      3.45                    osd.42  up      1
43      3.45                    osd.43  up      1
44      3.45                    osd.44  up      1
45      3.45                    osd.45  up      1
46      3.45                    osd.46  up      1
47      3.45                    osd.47  up      1

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux