Hi,
Thanks for providing guidance
VD0 is the SSD drive
Many people suggested to not enable WB for SSD so that cache can be used for HDD where is needed more
Setup is 3 identical DELL R620 server OSD01, OSD02, OSD04
10 GB separate networks, 600 GB Entreprise HDD , 320 GB Entreprise SSD
Blustore, separate WAL / DB on SSD ( 1 GB partition for WAL, 30GB for DB)
With 2 OSD per servers and only OSD01, OSD02 , performance is as expected ( no gaps CUR MB/s )
Adding one OSD from OSD04, tanks performance ( lots of gaps CUR MB/s 0 )
See below
ceph -s
cluster:
id: 1e98e57a-ef41-4327-b88a-dd2531912632
health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set
WITH OSD04
ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 2.87256 root default
-7 1.14899 host osd02
0 hdd 0.57500 osd.0 up 1.00000 1.00000
1 hdd 0.57500 osd.1 up 1.00000 1.00000
-3 1.14899 host osd03
2 hdd 0.57500 osd.2 up 1.00000 1.00000
3 hdd 0.57500 osd.3 up 1.00000 1.00000
-4 0.57458 host osd04
4 hdd 0.57458 osd.4 up 1.00000 1.00000
2018-04-10 08:37:08.111037 min lat: 0.0128562 max lat: 13.1623 avg lat: 0.528273
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
100 16 3001 2985 119.388 90 0.0169507 0.528273
101 16 3029 3013 119.315 112 0.0410565 0.524325
102 16 3029 3013 118.145 0 - 0.524325
103 16 3029 3013 116.998 0 - 0.524325
104 16 3029 3013 115.873 0 - 0.524325
105 16 3071 3055 116.37 42 0.0888923 0.54832
106 16 3156 3140 118.479 340 0.0162464 0.535244
107 16 3156 3140 117.372 0 - 0.535244
108 16 3156 3140 116.285 0 - 0.535244
109 16 3156 3140 115.218 0 - 0.535244
110 16 3156 3140 114.171 0 - 0.535244
111 16 3156 3140 113.142 0 - 0.535244
112 16 3156 3140 112.132 0 - 0.535244
113 16 3156 3140 111.14 0 - 0.535244
114 16 3156 3140 110.165 0 - 0.535244
115 16 3156 3140 109.207 0 - 0.535244
116 16 3230 3214 110.817 29.6 0.0169969 0.574856
117 16 3311 3295 112.639 324 0.0704851 0.565529
118 16 3311 3295 111.684 0 - 0.565529
119 16 3311 3295 110.746 0 - 0.565529
2018-04-10 08:37:28.112886 min lat: 0.0128562 max lat: 14.7293 avg lat: 0.565529
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
120 16 3311 3295 109.823 0 - 0.565529
121 16 3311 3295 108.915 0 - 0.565529
122 16 3311 3295 108.022 0 - 0.565529
Total time run: 122.568983
Total writes made: 3312
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 108.086
Stddev Bandwidth: 121.191
Max bandwidth (MB/sec): 520
Min bandwidth (MB/sec): 0
Average IOPS: 27
Stddev IOPS: 30
Max IOPS: 130
Min IOPS: 0
Average Latency(s): 0.591771
Stddev Latency(s): 1.74753
Max latency(s): 14.7293
Min latency(s): 0.0128562
AFTER ceph osd down osd.4; ceph osd out osd.4
ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 2.87256 root default
-7 1.14899 host osd02
0 hdd 0.57500 osd.0 up 1.00000 1.00000
1 hdd 0.57500 osd.1 up 1.00000 1.00000
-3 1.14899 host osd03
2 hdd 0.57500 osd.2 up 1.00000 1.00000
3 hdd 0.57500 osd.3 up 1.00000 1.00000
-4 0.57458 host osd04
4 hdd 0.57458 osd.4 up 0 1.00000
2018-04-10 08:46:55.193642 min lat: 0.0156532 max lat: 2.5884 avg lat: 0.310681
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
100 16 5144 5128 205.097 220 0.0372222 0.310681
101 16 5196 5180 205.126 208 0.421245 0.310908
102 16 5232 5216 204.526 144 0.543723 0.311544
103 16 5271 5255 204.055 156 0.465998 0.312394
104 16 5310 5294 203.593 156 0.483188 0.313355
105 16 5357 5341 203.444 188 0.0313209 0.313267
106 16 5402 5386 203.223 180 0.517098 0.313714
107 16 5457 5441 203.379 220 0.0277359 0.313288
108 16 5515 5499 203.644 232 0.470556 0.313203
109 16 5565 5549 203.611 200 0.564713 0.313173
110 16 5606 5590 203.25 164 0.0223166 0.313596
111 16 5659 5643 203.329 212 0.0231103 0.313597
112 16 5703 5687 203.085 176 0.033348 0.314018
113 16 5757 5741 203.199 216 1.53862 0.313991
114 16 5798 5782 202.855 164 0.4711 0.314511
115 16 5852 5836 202.969 216 0.0350226 0.31424
116 16 5912 5896 203.288 240 0.0253188 0.313657
117 16 5964 5948 203.328 208 0.0223623 0.313562
118 16 6024 6008 203.639 240 0.174245 0.313531
119 16 6070 6054 203.473 184 0.712498 0.313582
2018-04-10 08:47:15.195873 min lat: 0.0154679 max lat: 2.5884 avg lat: 0.313564
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
120 16 6120 6104 203.444 200 0.0351212 0.313564
Total time run: 120.551897
Total writes made: 6120
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 203.066
Stddev Bandwidth: 43.8329
Max bandwidth (MB/sec): 480
Min bandwidth (MB/sec): 128
Average IOPS: 50
Stddev IOPS: 10
Max IOPS: 120
Min IOPS: 32
Average Latency(s): 0.314959
Stddev Latency(s): 0.379298
Max latency(s): 2.5884
Min latency(s): 0.0154679
On Tue, 10 Apr 2018 at 07:58, Kai Wagner <kwagner@xxxxxxxx> wrote:
Is this just from one server or from all servers? Just wondering why VD
0 is using WriteThrough compared to the others. If that's the setup for
the OSD's you already have a cache setup problem.
On 10.04.2018 13:44, Mohamad Gebai wrote:
> megacli -LDGetProp -cache -Lall -a0
>
> Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough,
> ReadAheadNone, Direct, Write Cache OK if bad BBU
> Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive,
> Cached, No Write Cache if bad BBU
> Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive,
> Cached, No Write Cache if bad BBU
> Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive,
> Cached, No Write Cache if bad BBU
--
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com