Performance Questions - not only small files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear list,

 

we have replicated gluster volumes much slower than the brick disks and I wonder if this is a configuration issue, a conceptual issue of our setup or really how slow gluster just is.

 

The setup:

  • Three servers in a ring connected via IP over Infiniband, 100Gb/s on each link
  • 2 x U.2 2TB-SSDs as RAID1, on a Megaraid controller, connected via NVMe links on each node
  • A 3x replicated volume on a thin-pool LVM on the SSD-RAIDs.
  • 2 x 26 cores, lots of RAM
  • The volumes are locally mounted and used as shared disks for computation jobs (CPU and GPU) on the nodes.
  • The LVM thinpool is shared with other volumes and with a cache pool for the hard disks.

 

The measurement:

  • Iozone in automated mode, on the gluster volume (Set1) and on the mounted brick disk (Baseline)
  • Compared with iozone_results_comparator.py

 

The issue:

  • Smaller files are around factor 2 slower than larger files on the SSD, but factor 6-10 slower than larger files on the gluster volume (somewhat expected)
  • Larger files are, except for certain read accesses, still factor 3-10 slower on the gluster volume than on the SSD RAID directly, depending on the operation.
  • Operations with many, admittedly smaller files (checkouts, copying, rsync and unpacking) can extend into hours, where they take tens of seconds to few minutes on disk.
  • atop sometimes shows 9x% busy on some of the LVM block devices – and sometimes the avio value increases from 4-26ms to 3000-5000ms. Otherwise, there is nothing mentionable observed regarding system load.

 

This does not seem to be network bound. Accessing from a remote VM via glusterfs-mount is not much slower despite connecting via 10GbE instead of 100GB. Nethogs shows sent traffic on the Infiniband interface going up to 30000kB/s for large records – 240Mb/s over a 100Gb/s Interface ...

The machine itself is mostly idle – no iowaits, a load average of 6 on 104 logical cpus. Glusterfsd is sometimes pushing up to 10% CPU. I just do not see what the bottleneck should be.

 

Below a summary out of the compared iozone reports.

 

Any hints if it is worth trying to optimize this at all and where to start looking? I believe I have checked all the “standard” hints, but might have missed something.

 

The alternative would be a single storage node connected via nfs, sacrificing the probably not needed redundancy/high availability of the replicated filesystem.

 

Thanks a lot for any comments in advance,

 

Best

Rupert

summary

 

Operation

Write

Re-write

Read

Re-read

Random read

Random write

Backwards read

Record rewrite

Strided Read

Fwrite

Frewrite

Fread

Freread

ALL

baseline

first quartile

1260.16

1894.11

3171.44

3002.55

3679.12

2214.07

2987.15

2701.4

3361.62

1783.45

1739.6

3231.97

3405.8

2152.28

median

1474.31

2145.7

3637.6

3383.24

4167.16

2570.24

3435.44

3751.41

3967.55

2059.25

1992.64

3611.78

3936.96

3125.35

third quartile

1595.17

2318.0

3992.78

3840.0

4803.77

3013.13

3864.87

4420.33

4673.13

2354.6

2258.15

4036.73

4570.02

3915.54

minimum

194.64

960.92

1785.65

2152.15

1840.86

947.03

1491.1

1135.78

1732.88

999.01

933.54

1745.15

2083.72

194.64

maximum

2041.55

2819.84

5628.61

7342.68

7142.61

4776.16

6091.17

15908.02

6922.3

3540.22

3787.35

5933.0

7510.19

15908.02

mean val.

1408.31

2070.62

3663.03

3505.74

4231.31

2633.07

3444.43

4102.42

4029.73

2087.25

2027.23

3641.8

4113.02

3150.61

standard dev.

306.93

366.43

652.73

816.6

1048.38

781.84

796.13

2277.33

1041.12

511.69

464.33

687.27

1012.13

1335.36

ci. min. 90%

1363.0

2016.52

3566.67

3385.19

4076.54

2517.65

3326.9

3766.21

3876.03

2011.71

1958.68

3540.34

3963.6

3096.31

ci. max. 90%

1453.62

2124.71

3759.39

3626.3

4386.08

2748.5

3561.96

4438.62

4183.44

2162.79

2095.78

3743.27

4262.44

3204.91

geom. mean

1362.58

2034.2

3604.92

3422.21

4094.96

2511.93

3346.74

3637.05

3892.76

2024.11

1975.46

3576.11

3996.51

2883.44

set1

first quartile

173.4

174.43

208.58

2813.71

3498.77

139.63

51.51

91.49

124.34

201.55

199.05

187.72

2509.21

189.36

median

312.05

293.47

267.53

3134.22

4005.72

332.09

110.38

302.02

216.41

359.67

355.33

234.42

2742.35

357.31

third quartile

414.07

410.15

401.42

3446.69

4429.1

436.45

178.94

429.6

397.15

463.62

480.29

317.98

3092.73

686.35

minimum

44.11

45.6

59.07

2089.07

2004.07

9.39

8.67

2.77

32.58

38.63

41.09

26.0

1359.63

2.77

maximum

606.95

610.49

4457.8

6248.75

6153.85

666.55

374.16

696.91

2593.43

647.26

700.02

451.39

4353.18

6248.75

mean val.

308.37

309.17

544.27

3181.18

3964.96

305.76

123.49

282.89

402.01

337.11

342.11

248.76

2780.37

1010.03

standard dev.

154.66

157.37

892.64

635.03

825.98

183.62

87.03

191.43

493.51

160.15

173.1

100.31

484.33

1358.34

ci. min. 90%

285.54

285.93

412.49

3087.43

3843.02

278.65

110.65

254.63

329.15

313.47

316.56

233.95

2708.87

954.8

ci. max. 90%

331.2

332.4

676.05

3274.93

4086.9

332.87

136.34

311.15

474.86

360.76

367.67

263.57

2851.87

1065.27

geom. mean

260.27

260.16

318.88

3127.01

3872.62

220.54

86.71

174.2

243.43

285.24

286.02

223.44

2737.93

414.26

linear regression slope 90%

0.17 - 0.28

0.1 - 0.21

-0.05 - 0.35

0.83 - 0.97

0.85 - 1.0

0.08 - 0.15

0.02 - 0.05

0.04 - 0.07

0.04 - 0.17

0.12 - 0.2

0.13 - 0.22

0.05 - 0.09

0.6 - 0.72

0.28 - 0.35

ttest equality

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

baseline set1 difference

-78.1 %

-85.07 %

-85.14 %

-9.26 %

-6.29 %

-88.39 %

-96.41 %

-93.1 %

-90.02 %

-83.85 %

-83.12 %

-93.17 %

-32.4 %

-67.94 %

ttest p-value

0.0

0.0

0.0

0.0005

0.026

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

 

 

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux