Thanks, I've temporary disabled both scrubbing and deep-scrubbing, things are getting better I feel.
I just noticed high traffic generated on pool default.rgw.gc
pool default.rgw.gc id 7
client io 2162 MB/s rd, 0 B/s wr, 3023 op/s rd, 0 op/s wr
There is lot of data written via radosgw
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
385T 142T 243T 63.08 31384k
POOLS:
NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED
volumes 3 N/A N/A 40352G 72.19 15547G 10352564 10109k 2155M 2574M 118T
default.rgw.gc 7 N/A N/A 0 0 15547G 32 32 1820M 5352k 0
default.rgw.buckets.data 20 N/A N/A 80596G 72.16 31094G 21235439 20737k 123M 114M 118T
Although reads stats on default.rgw.gc are high according to ceph osd pool stats command, I don't see much rMB/s on HDDs on any cluster node. However r/s seems to be significant? . Any idea why it is like that ?
# iostat -xdm 5 1000
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 2,20 0,00 2,60 0,00 0,02 14,77 0,00 0,00 0,00 0,00 0,00 0,00
sdb 0,00 0,40 35,80 10,40 0,66 1,74 106,04 0,40 8,74 10,48 2,77 6,86 31,68
sdc 0,00 0,60 45,80 15,60 0,78 3,06 127,87 0,63 10,20 12,84 2,46 7,10 43,60
sdd 0,00 0,60 27,00 10,60 0,43 1,79 121,25 0,31 8,19 11,41 0,00 6,60 24,80
sde 0,00 1,40 36,80 18,80 0,65 3,96 169,60 0,51 9,17 12,85 1,96 5,99 33,28
sdf 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
sdg 0,00 0,40 33,40 7,00 0,52 0,93 73,26 0,41 10,16 12,10 0,91 7,37 29,76
sdh 0,00 1,40 53,60 13,60 0,94 2,62 108,32 0,70 10,45 12,61 1,94 7,42 49,84
sdi 0,00 1,00 34,40 10,80 0,61 2,14 124,82 0,38 8,44 10,37 2,30 6,57 29,68
sdj 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
sdk 0,00 1,00 39,20 13,20 0,69 2,66 131,13 0,60 11,45 14,31 2,97 7,21 37,76
sdl 0,00 0,80 25,40 11,20 0,45 1,92 132,86 0,29 8,00 10,49 2,36 6,54 23,92
sdm 0,00 1,00 37,20 8,00 0,72 1,46 98,82 0,44 9,73 11,08 3,50 7,86 35,52
nvme0n1 0,00 0,00 0,00 700,60 0,00 20,34 59,47 0,07 0,10 0,00 0,10 0,02 1,44
dm-0 0,00 0,00 0,00 4,80 0,00 0,02 8,00 0,00 0,00 0,00 0,00 0,00 0,00
Can I schedule garbage collection process to run on particular days/hours ?
Rados bench looks quite fine during garbage collection run I think.
# rados bench -p benchmark_replicated 20 write
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 20 seconds or 0 objects
Object prefix: benchmark_data_sg08-09_175315
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 364 348 1391.88 1392 0.0638181 0.0443111
2 16 711 695 1389.83 1388 0.0166734 0.0454365
3 16 1052 1036 1381.15 1364 0.0756525 0.0458462
4 16 1401 1385 1384.82 1396 0.0800253 0.0459712
5 16 1764 1748 1398.21 1452 0.048202 0.0455071
6 16 2117 2101 1400.48 1412 0.0154943 0.0455181
7 16 2468 2452 1400.96 1404 0.0324203 0.0454904
8 16 2809 2793 1396.32 1364 0.0425057 0.0456571
9 16 3175 3159 1403.82 1464 0.0535376 0.0454215
10 16 3541 3525 1409.82 1464 0.0668655 0.0452501
11 16 3911 3895 1416.18 1480 0.0506286 0.0450723
12 16 4267 4251 1416.82 1424 0.0266732 0.0450444
13 16 4615 4599 1414.9 1392 0.101581 0.045124
14 16 4977 4961 1417.25 1448 0.0342007 0.0450346
15 16 5351 5335 1422.48 1496 0.022117 0.0449238
16 16 5691 5675 1418.57 1360 0.022683 0.0450504
17 16 6035 6019 1416.05 1376 0.0702069 0.0451103
18 16 6397 6381 1417.82 1448 0.0231964 0.0450781
19 16 6750 6734 1417.5 1412 0.0131453 0.0450462
2018-02-01 08:30:57.941618 min lat: 0.0117176 max lat: 0.794775 avg lat: 0.0451095
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
20 16 7100 7084 1416.62 1400 0.0239063 0.0451095
Total time run: 20.040338
Total writes made: 7100
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 1417.14
Stddev Bandwidth: 40.6598
Max bandwidth (MB/sec): 1496
Min bandwidth (MB/sec): 1360
Average IOPS: 354
Stddev IOPS: 10
Max IOPS: 374
Min IOPS: 340
Average Latency(s): 0.0451394
Stddev Latency(s): 0.0264402
Max latency(s): 0.794775
Min latency(s): 0.0117176
Cleaning up (deleting benchmark objects)
Removed 7100 objects
Clean up completed and total clean up time :0.658175
Thanks
Jakub
On Thu, Feb 1, 2018 at 12:43 AM, Sergey Malinin <hell@xxxxxxxxxxx> wrote:
Deep scrub is I/O-expensive. If deep scrub is unnecessary, you can disable it with "ceph osd pool set <poolname> nodeep-scrub".On Thursday, February 1, 2018 at 00:10, Jakub Jaszewski wrote:
3 active+clean+scrubbing+deep
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com