Ceph hunting for monitor on load

Erwin Lubbers <ceph@xxxxxxxxxxxxxxxxx> · Wed, 28 Jan 2015 17:45:35 +0100

Hi,

I'm running a small Ceph cluster (Emperor), with 3 servers, each running a monitor and two 280 GB OSDs (plus an SSD for the journals). Servers have 16 GB memory and a 8 core Xeon processor and are connected with 3x 1 gbps (lacp trunk).

As soon as I give the cluster some load from a client server (same specs only 2x 1 gbps) using:

rados -p test bench 20 write

it starts hunting for a new monitor (on low load it doesn't do that). 

ceph -w tells:

2015-01-28 17:31:52.998325 mon.0 [INF] pgmap v13238085: 640 pgs: 640 active+clean; 295 GB data, 878 GB used, 797 GB / 1675 GB avail; 130 kB/s wr, 11 op/s
2015-01-28 17:31:54.007069 mon.0 [INF] pgmap v13238086: 640 pgs: 640 active+clean; 295 GB data, 878 GB used, 797 GB / 1675 GB avail; 33734 B/s wr, 2 op/s
2015-01-28 17:31:55.029147 mon.0 [INF] pgmap v13238087: 640 pgs: 640 active+clean; 295 GB data, 878 GB used, 797 GB / 1675 GB avail; 72610 B/s wr, 6 op/s
2015-01-28 17:31:58.235334 mon.0 [INF] pgmap v13238088: 640 pgs: 640 active+clean; 295 GB data, 878 GB used, 797 GB / 1675 GB avail; 54383 B/s wr, 3 op/s
2015-01-28 17:31:59.961473 mon.0 [INF] pgmap v13238089: 640 pgs: 640 active+clean; 295 GB data, 878 GB used, 797 GB / 1675 GB avail; 26625 B/s wr, 1 op/s
2015-01-28 17:19:09.179620 mon.2 [INF] mon.c01 calling new monitor election
2015-01-28 17:19:23.717863 mon.2 [INF] mon.c01 calling new monitor election
2015-01-28 17:32:14.617226 mon.2 [INF] mon.c01 calling new monitor election
2015-01-28 17:32:14.867814 mon.0 [INF] pgmap v13238091: 640 pgs: 640 active+clean; 295 GB data, 878 GB used, 797 GB / 1675 GB avail; 389 kB/s wr, 18 op/s
2015-01-28 17:32:14.884187 mon.0 [INF] mon.c03 calling new monitor election
2015-01-28 17:32:20.537763 mon.0 [INF] mon.c03@0 won leader election with quorum 0,2
2015-01-28 17:32:22.999217 mon.0 [INF] pgmap v13238091: 640 pgs: 640 active+clean; 295 GB data, 878 GB used, 797 GB / 1675 GB avail; 389 kB/s wr, 18 op/s
2015-01-28 17:32:22.999305 mon.0 [INF] mdsmap e1: 0/0/1 up
2015-01-28 17:32:22.999408 mon.0 [INF] osdmap e947: 6 osds: 6 up, 6 in
2015-01-28 17:32:14.614519 mon.1 [INF] mon.c02 calling new monitor election
2015-01-28 17:32:22.999510 mon.0 [INF] monmap e3: 3 mons at {c01=10.63.201.250:6789/0,c02=10.63.201.249:6789/0,c03=10.63.201.248:6789/0}
2015-01-28 17:32:24.593304 mon.0 [INF] pgmap v13238092: 640 pgs: 640 active+clean; 296 GB data, 878 GB used, 796 GB / 1675 GB avail; 3531 kB/s rd, 8656 kB/s wr, 177 op/s
2015-01-28 17:32:24.709239 mon.0 [INF] mon.c03 calling new monitor election
2015-01-28 17:32:24.710870 mon.2 [INF] mon.c01 calling new monitor election
2015-01-28 17:32:25.794186 mon.0 [INF] mon.c03@0 won leader election with quorum 0,1,2
2015-01-28 17:32:25.811166 mon.0 [INF] pgmap v13238092: 640 pgs: 640 active+clean; 296 GB data, 878 GB used, 796 GB / 1675 GB avail; 3531 kB/s rd, 8656 kB/s wr, 177 op/s
2015-01-28 17:32:25.811238 mon.0 [INF] mdsmap e1: 0/0/1 up
2015-01-28 17:32:25.811328 mon.0 [INF] osdmap e947: 6 osds: 6 up, 6 in
2015-01-28 17:32:25.811467 mon.0 [INF] monmap e3: 3 mons at {c01=10.63.201.250:6789/0,c02=10.63.201.249:6789/0,c03=10.63.201.248:6789/0}
2015-01-28 17:32:25.888326 mon.0 [INF] pgmap v13238093: 640 pgs: 640 active+clean; 296 GB data, 879 GB used, 796 GB / 1675 GB avail; 5380 kB/s rd, 15689 kB/s wr, 349 op/s
2015-01-28 17:32:28.019105 mon.0 [INF] pgmap v13238094: 640 pgs: 640 active+clean; 296 GB data, 879 GB used, 796 GB / 1675 GB avail; 12328 kB/s rd, 41132 kB/s wr, 1052 op/s
2015-01-28 17:32:29.028272 mon.0 [INF] pgmap v13238095: 640 pgs: 640 active+clean; 295 GB data, 878 GB used, 796 GB / 1675 GB avail; 6861 kB/s rd, 16312 kB/s wr, 458 op/s
2015-01-28 17:32:30.056662 mon.0 [INF] pgmap v13238096: 640 pgs: 640 active+clean; 295 GB data, 878 GB used, 796 GB / 1675 GB avail; 7366 kB/s rd, 10759 kB/s wr, 484 op/s
2015-01-28 17:32:33.014282 mon.0 [INF] pgmap v13238097: 640 pgs: 640 active+clean; 295 GB data, 878 GB used, 796 GB / 1675 GB avail; 4070 kB/s rd, 2030 kB/s wr, 304 op/s
2015-01-28 17:32:34.040452 mon.0 [INF] pgmap v13238098: 640 pgs: 640 active+clean; 295 GB data, 878 GB used, 797 GB / 1675 GB avail; 2306 kB/s rd, 785 kB/s wr, 221 op/s
2015-01-28 17:32:35.050140 mon.0 [INF] pgmap v13238099: 640 pgs: 640 active+clean; 295 GB data, 878 GB used, 797 GB / 1675 GB avail; 1339 kB/s rd, 396 kB/s wr, 162 op/s
2015-01-28 17:32:38.013409 mon.0 [INF] pgmap v13238100: 640 pgs: 640 active+clean; 295 GB data, 878 GB used, 797 GB / 1675 GB avail; 100713 B/s rd, 139 kB/s wr, 15 op/s

And the first election message has a timestamp in the past, but all Ceph servers have ntp synchronized clocks. And before that first is shown on the console the logging is quiet for at least 10 seconds.

The output of the benchmark shows:

 Maintaining 16 concurrent writes of 4194304 bytes for up to 20 seconds or 0 objects
 Object prefix: benchmark_data_k01.management_2503
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
     1      16        16         0         0         0         -         0
     2      16        16         0         0         0         -         0
     3      16        17         1   1.33312   1.33333   2.64579   2.64579
     4      16        17         1  0.999855         0         -   2.64579
     5      16        20         4   3.19957         6   4.85805   4.23818
     6      16        30        14   9.33206        40   5.85955   5.13916
     7      16        31        15   8.57026         4   6.43014   5.22523
     8      16        31        15   7.49898         0         -   5.22523
     9      16        31        15   6.66579         0         -   5.22523
    10      16        44        28   11.1986   17.3333   4.12193   5.01271
    11      16        45        29   10.5441         4   4.37712   4.99079
    12      16        56        40   13.3294        44   2.46627   4.35678
    13      16        56        40   12.3042         0         -   4.35678
    14      16        56        40   11.4255         0         -   4.35678
    15      16        61        45   11.9945   6.66667   5.45991   4.43916
    16      16        70        54    13.494        36  0.820473   4.30012
    17      16        74        58   13.6413        16   1.45099   4.09778
    18      15        83        68   15.1046        40   3.14914   3.90607
    19      16        92        76   15.9933        32   1.39559    3.8017
2015-01-28 17:37:22.439872 7fcff025e700  0 monclient: hunting for new mon
2015-01-28 17:37:22.953355min lat: 0.820473 max lat: 9.24111 avg lat: 3.72117
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    20      16        95        79   15.7932        12   1.74725   3.72117
    21      16        96        80   15.2318         4   2.43613    3.7051
    22      16        96        80   14.5396         0         -    3.7051
    23      16        96        80   13.9075         0         -    3.7051
    24      16        96        80   13.3282         0         -    3.7051

Total time run:         24.832426
Total writes made:      96
Write size:             4194304
Bandwidth (MB/sec):     15.464

Stddev Bandwidth:       15.156
Max bandwidth (MB/sec): 44
Min bandwidth (MB/sec): 0
Average Latency:        4.13656
Stddev Latency:         1.906
Max latency:            9.24111
Min latency:            0.820473

ipref tests show the client and server machine can do around 935 mbps.

Any ideas on the monitor hunting?

Regards,
Erwin
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com