Re: poor cephFS performance on Nautilus 14.2.9 deployed by ceph_ansible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Derrick, 

I am not sure what this 200-300MB/s on hdd is. But it is probably not 
really relevant. I am testing native disk performance before I use them 
with ceph with this fio script. It is a bit lengthy, that is because I 
want to be able to have data for possible future use cases. 

Furthermore since I upgraded to Nautilus I have been having issues with 
the kernel mount cephfs on osd nodes and had to revert back to fuse. 
Even when having 88GB free memory.

https://tracker.ceph.com/issues/45663
https://tracker.ceph.com/issues/44100


[global]
ioengine=libaio
#ioengine=posixaio
invalidate=1
ramp_time=30
iodepth=1
runtime=180
time_based
direct=1
filename=/dev/sdX
#filename=/mnt/cephfs/ssd/fio-bench.img

[write-4k-seq]
stonewall
bs=4k
rw=write
#write_bw_log=sdx-4k-write-seq.results
#write_iops_log=sdx-4k-write-seq.results

[randwrite-4k-seq]
stonewall
bs=4k
rw=randwrite
#write_bw_log=sdx-4k-randwrite-seq.results
#write_iops_log=sdx-4k-randwrite-seq.results

[read-4k-seq]
stonewall
bs=4k
rw=read
#write_bw_log=sdx-4k-read-seq.results
#write_iops_log=sdx-4k-read-seq.results

[randread-4k-seq]
stonewall
bs=4k
rw=randread
#write_bw_log=sdx-4k-randread-seq.results
#write_iops_log=sdx-4k-randread-seq.results

[rw-4k-seq]
stonewall
bs=4k
rw=rw
#write_bw_log=sdx-4k-rw-seq.results
#write_iops_log=sdx-4k-rw-seq.results

[randrw-4k-seq]
stonewall
bs=4k
rw=randrw
#write_bw_log=sdx-4k-randrw-seq.results
#write_iops_log=sdx-4k-randrw-seq.results

[write-128k-seq]
stonewall
bs=128k
rw=write
#write_bw_log=sdx-128k-write-seq.results
#write_iops_log=sdx-128k-write-seq.results

[randwrite-128k-seq]
stonewall
bs=128k
rw=randwrite
#write_bw_log=sdx-128k-randwrite-seq.results
#write_iops_log=sdx-128k-randwrite-seq.results

[read-128k-seq]
stonewall
bs=128k
rw=read
#write_bw_log=sdx-128k-read-seq.results
#write_iops_log=sdx-128k-read-seq.results

[randread-128k-seq]
stonewall
bs=128k
rw=randread
#write_bw_log=sdx-128k-randread-seq.results
#write_iops_log=sdx-128k-randread-seq.results

[rw-128k-seq]
stonewall
bs=128k
rw=rw
#write_bw_log=sdx-128k-rw-seq.results
#write_iops_log=sdx-128k-rw-seq.results

[randrw-128k-seq]
stonewall
bs=128k
rw=randrw
#write_bw_log=sdx-128k-randrw-seq.results
#write_iops_log=sdx-128k-randrw-seq.results

[write-1024k-seq]
stonewall
bs=1024k
rw=write
#write_bw_log=sdx-1024k-write-seq.results
#write_iops_log=sdx-1024k-write-seq.results

[randwrite-1024k-seq]
stonewall
bs=1024k
rw=randwrite
#write_bw_log=sdx-1024k-randwrite-seq.results
#write_iops_log=sdx-1024k-randwrite-seq.results

[read-1024k-seq]
stonewall
bs=1024k
rw=read
#write_bw_log=sdx-1024k-read-seq.results
#write_iops_log=sdx-1024k-read-seq.results

[randread-1024k-seq]
stonewall
bs=1024k
rw=randread
#write_bw_log=sdx-1024k-randread-seq.results
#write_iops_log=sdx-1024k-randread-seq.results

[rw-1024k-seq]
stonewall
bs=1024k
rw=rw
#write_bw_log=sdx-1024k-rw-seq.results
#write_iops_log=sdx-1024k-rw-seq.results

[randrw-1024k-seq]
stonewall
bs=1024k
rw=randrw
#write_bw_log=sdx-1024k-randrw-seq.results
#write_iops_log=sdx-1024k-randrw-seq.results

[write-4096k-seq]
stonewall
bs=4096k
rw=write
#write_bw_log=sdx-4096k-write-seq.results
#write_iops_log=sdx-4096k-write-seq.results

[randwrite-4096k-seq]
stonewall
bs=4096k
rw=randwrite
#write_bw_log=sdx-4096k-randwrite-seq.results
#write_iops_log=sdx-4096k-randwrite-seq.results

[read-4096k-seq]
stonewall
bs=4096k
rw=read
#write_bw_log=sdx-4096k-read-seq.results
#write_iops_log=sdx-4096k-read-seq.results

[randread-4096k-seq]
stonewall
bs=4096k
rw=randread
#write_bw_log=sdx-4096k-randread-seq.results
#write_iops_log=sdx-4096k-randread-seq.results

[rw-4096k-seq]
stonewall
bs=4096k
rw=rw
#write_bw_log=sdx-4096k-rw-seq.results
#write_iops_log=sdx-4096k-rw-seq.results

[randrw-4096k-seq]
stonewall
bs=4096k
rw=randrw
#write_bw_log=sdx-4096k-randrw-seq.results
#write_iops_log=sdx-4096k-randrw-seq.results
 





-----Original Message-----
From: Derrick Lin [mailto:klin938@xxxxxxxxx] 
Sent: dinsdag 9 juni 2020 4:12
To: Mark Nelson
Cc: ceph-users@xxxxxxx
Subject:  Re: poor cephFS performance on Nautilus 14.2.9 
deployed by ceph_ansible

Thanks Mark & Marc

We will do more testing inc kernel client as well as testing the block 
storage performance first.

We just did some direct raw performance test on a single spinning disk 
(format as ext4) and it could delivery 200-300MB/s throughput in various 
writing and mix testings. But FUSE client could only give ~50MB/s.

Cheers,
D

On Thu, Jun 4, 2020 at 1:27 PM Mark Nelson <mnelson@xxxxxxxxxx> wrote:

> Try using the kernel client instead of the FUSE client.  The FUSE 
> client is known to be slow for a variety of reasons and I suspect you 
> may see faster performance with the kernel client.
>
>
> Thanks,
>
> Mark
>
>
> On 6/2/20 8:00 PM, Derrick Lin wrote:
> > Hi guys,
> >
> > We just deployed a CEPH 14.2.9 cluster with the following hardware:
> >
> > MDSS x 1
> > Xeon Gold 5122 3.6Ghz
> > 192GB
> > Mellanox ConnectX-4 Lx 25GbE
> >
> >
> > MON x 3
> > Xeon Bronze 3103 1.7Ghz
> > 48GB
> > Mellanox ConnectX-4 Lx 25GbE
> > 6 x 600GB 10K SAS
> >
> > OSD x 5
> > Xeon Silver 4110 2.1Ghz x 2
> > 192GB
> > Mellanox ConnectX-4 Lx 25GbE
> > 16 x 10TB 7.2K NLSAS (block)
> > 2 x 2TB Intel P4600 NVMe (block.db)
> >
> > Network is all Mellanox SN2410/SN2700 configured at 25GbE for both 
> > front and back network.
> >
> > Just for POC at this stage, the cluster was deployed by ceph_ansible 

> > without much customization and the initial test on its cephFS FUSE 
> > mount performance seems to be very low. We did some test with iozone 

> > the result as follow:
> >
> > ]# /opt/iozone/bin/iozone -i 0 -i 1-r 128k -s 5G -t 20
> >          Iozone: Performance Test of File I/O
> >                  Version $Revision: 3.465 $
> >                  Compiled for 64 bit mode.
> >                  Build: linux-AMD64
> >
> >          Contributors:William Norcott, Don Capps, Isom Crawford, 
> > Kirby Collins
> >                       Al Slater, Scott Rhine, Mike Wisner, Ken Goss
> >                       Steve Landherr, Brad Smith, Mark Kelly, Dr. 
> > Alain
> CYR,
> >                       Randy Dunlap, Mark Montague, Dan Million, 
> > Gavin Brebner,
> >                       Jean-Marc Zucconi, Jeff Blomberg, Benny 
> > Halevy,
> Dave
> > Boone,
> >                       Erik Habbinga, Kris Strecker, Walter Wong, 
> > Joshua
> Root,
> >                       Fabrice Bacchella, Zhenghua Xue, Qin Li, 
> > Darren
> Sawyer,
> >                       Vangel Bojaxhi, Ben England, Vikentsi Lapa,
> >                       Alexey Skidanov.
> >
> >          Run began: Tue Jun  2 16:40:53 2020
> >
> >          File size set to 5242880 kB
> >          Command line used: /opt/iozone/bin/iozone -i 0 -i 1-r -s 5G 

> > -t
> 20
> > 128k
> >          Output is in kBytes/sec
> >          Time Resolution = 0.000001 seconds.
> >          Processor cache size set to 1024 kBytes.
> >          Processor cache line size set to 32 bytes.
> >          File stride size set to 17 * record size.
> >          Throughput test with 20 processes
> >          Each process writes a 5242880 kByte file in 4 kByte records
> >
> >          Children see throughput for 20 initial writers  =   
35001.12
> kB/sec
> >          Parent sees throughput for 20 initial writers   =   
34967.65
> kB/sec
> >          Min throughput per process                      =    
1748.22
> kB/sec
> >          Max throughput per process                      =    
1751.62
> kB/sec
> >          Avg throughput per process                      =    
1750.06
> kB/sec
> >          Min xfer                                        = 
5232724.00 kB
> >
> >          Children see throughput for 20 rewriters        =   
35704.79
> kB/sec
> >          Parent sees throughput for 20 rewriters         =   
35704.30
> kB/sec
> >          Min throughput per process                      =    
1783.44
> kB/sec
> >          Max throughput per process                      =    
1786.29
> kB/sec
> >          Avg throughput per process                      =    
1785.24
> kB/sec
> >          Min xfer                                        = 
5234532.00 kB
> >
> >          Children see throughput for 20 readers          = 
49368539.50
> kB/sec
> >          Parent sees throughput for 20 readers           = 
49317231.38
> kB/sec
> >          Min throughput per process                      = 
2414424.00
> kB/sec
> >          Max throughput per process                      = 
2599996.25
> kB/sec
> >          Avg throughput per process                      = 
2468426.98
> kB/sec
> >          Min xfer                                        = 
4868708.00 kB
> >
> >          Children see throughput for 20 re-readers       = 
48675891.50
> kB/sec
> >          Parent sees throughput for 20 re-readers        = 
48617335.67
> kB/sec
> >          Min throughput per process                      = 
2316395.25
> kB/sec
> >          Max throughput per process                      = 
2703868.75
> kB/sec
> >          Avg throughput per process                      = 
2433794.58
> kB/sec
> >          Min xfer                                        = 
4491704.00 kB
> >
> > We also did some dd tests, the write speed on a single test on our
> standard
> > server is ~50MB/s but on a very big memory server, the speed is 
> > double ~ 80-90MB/s.
> >
> > We have zero experience on ceph and as said we haven't done more 
> > tuning
> at
> > this stage. But if this sort of performance is way too low from 
> > those hardware spec?
> >
> > Any hints will be appreciated.
> >
> > Cheers
> > D
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 

> > email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
> email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux