Re: poor cephFS performance on Nautilus 14.2.9 deployed by ceph_ansible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi guys,

I tried to mount via kernel driver, it works beautifully. I was surprised,
below is one of the FIO test, which wasn't able to run at all in FUSE mount:

# /usr/bin/fio --randrepeat=1 --ioengine=libaio --direct=1
--gtod_reduce=1 --name=FIO --filename=fio.test --bs=4M --iodepth=16
--size=50G --readwrite=randrw --rwmixread=75
FIO: (g=0): rw=randrw, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB,
(T) 4096KiB-4096KiB, ioengine=libaio, iodepth=16
fio-3.7
Starting 1 process
FIO: Laying out IO file (1 file / 51200MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=1021MiB/s,w=340MiB/s][r=255,w=85
IOPS][eta 00m:00s]
FIO: (groupid=0, jobs=1): err= 0: pid=131431: Thu Jun 11 17:13:22 2020
   read: IOPS=249, BW=999MiB/s (1047MB/s)(37.5GiB/38408msec)
   bw (  KiB/s): min=819200, max=1171456, per=100.00%, avg=1023387.46,
stdev=69360.06, samples=76
   iops        : min=  200, max=  286, avg=249.83, stdev=16.96, samples=76
  write: IOPS=83, BW=334MiB/s (351MB/s)(12.5GiB/38408msec)
   bw (  KiB/s): min=229376, max=475136, per=99.96%, avg=342204.45,
stdev=40407.55, samples=76
   iops        : min=   56, max=  116, avg=83.51, stdev= 9.87, samples=76
  cpu          : usr=1.56%, sys=4.44%, ctx=12050, majf=0, minf=24
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.9%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=9590,3210,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=999MiB/s (1047MB/s), 999MiB/s-999MiB/s
(1047MB/s-1047MB/s), io=37.5GiB (40.2GB), run=38408-38408msec
  WRITE: bw=334MiB/s (351MB/s), 334MiB/s-334MiB/s (351MB/s-351MB/s),
io=12.5GiB (13.5GB), run=38408-38408msec




On Tue, Jun 9, 2020 at 6:16 PM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote:

>
> Hi Derrick,
>
> I am not sure what this 200-300MB/s on hdd is. But it is probably not
> really relevant. I am testing native disk performance before I use them
> with ceph with this fio script. It is a bit lengthy, that is because I
> want to be able to have data for possible future use cases.
>
> Furthermore since I upgraded to Nautilus I have been having issues with
> the kernel mount cephfs on osd nodes and had to revert back to fuse.
> Even when having 88GB free memory.
>
> https://tracker.ceph.com/issues/45663
> https://tracker.ceph.com/issues/44100
>
>
> [global]
> ioengine=libaio
> #ioengine=posixaio
> invalidate=1
> ramp_time=30
> iodepth=1
> runtime=180
> time_based
> direct=1
> filename=/dev/sdX
> #filename=/mnt/cephfs/ssd/fio-bench.img
>
> [write-4k-seq]
> stonewall
> bs=4k
> rw=write
> #write_bw_log=sdx-4k-write-seq.results
> #write_iops_log=sdx-4k-write-seq.results
>
> [randwrite-4k-seq]
> stonewall
> bs=4k
> rw=randwrite
> #write_bw_log=sdx-4k-randwrite-seq.results
> #write_iops_log=sdx-4k-randwrite-seq.results
>
> [read-4k-seq]
> stonewall
> bs=4k
> rw=read
> #write_bw_log=sdx-4k-read-seq.results
> #write_iops_log=sdx-4k-read-seq.results
>
> [randread-4k-seq]
> stonewall
> bs=4k
> rw=randread
> #write_bw_log=sdx-4k-randread-seq.results
> #write_iops_log=sdx-4k-randread-seq.results
>
> [rw-4k-seq]
> stonewall
> bs=4k
> rw=rw
> #write_bw_log=sdx-4k-rw-seq.results
> #write_iops_log=sdx-4k-rw-seq.results
>
> [randrw-4k-seq]
> stonewall
> bs=4k
> rw=randrw
> #write_bw_log=sdx-4k-randrw-seq.results
> #write_iops_log=sdx-4k-randrw-seq.results
>
> [write-128k-seq]
> stonewall
> bs=128k
> rw=write
> #write_bw_log=sdx-128k-write-seq.results
> #write_iops_log=sdx-128k-write-seq.results
>
> [randwrite-128k-seq]
> stonewall
> bs=128k
> rw=randwrite
> #write_bw_log=sdx-128k-randwrite-seq.results
> #write_iops_log=sdx-128k-randwrite-seq.results
>
> [read-128k-seq]
> stonewall
> bs=128k
> rw=read
> #write_bw_log=sdx-128k-read-seq.results
> #write_iops_log=sdx-128k-read-seq.results
>
> [randread-128k-seq]
> stonewall
> bs=128k
> rw=randread
> #write_bw_log=sdx-128k-randread-seq.results
> #write_iops_log=sdx-128k-randread-seq.results
>
> [rw-128k-seq]
> stonewall
> bs=128k
> rw=rw
> #write_bw_log=sdx-128k-rw-seq.results
> #write_iops_log=sdx-128k-rw-seq.results
>
> [randrw-128k-seq]
> stonewall
> bs=128k
> rw=randrw
> #write_bw_log=sdx-128k-randrw-seq.results
> #write_iops_log=sdx-128k-randrw-seq.results
>
> [write-1024k-seq]
> stonewall
> bs=1024k
> rw=write
> #write_bw_log=sdx-1024k-write-seq.results
> #write_iops_log=sdx-1024k-write-seq.results
>
> [randwrite-1024k-seq]
> stonewall
> bs=1024k
> rw=randwrite
> #write_bw_log=sdx-1024k-randwrite-seq.results
> #write_iops_log=sdx-1024k-randwrite-seq.results
>
> [read-1024k-seq]
> stonewall
> bs=1024k
> rw=read
> #write_bw_log=sdx-1024k-read-seq.results
> #write_iops_log=sdx-1024k-read-seq.results
>
> [randread-1024k-seq]
> stonewall
> bs=1024k
> rw=randread
> #write_bw_log=sdx-1024k-randread-seq.results
> #write_iops_log=sdx-1024k-randread-seq.results
>
> [rw-1024k-seq]
> stonewall
> bs=1024k
> rw=rw
> #write_bw_log=sdx-1024k-rw-seq.results
> #write_iops_log=sdx-1024k-rw-seq.results
>
> [randrw-1024k-seq]
> stonewall
> bs=1024k
> rw=randrw
> #write_bw_log=sdx-1024k-randrw-seq.results
> #write_iops_log=sdx-1024k-randrw-seq.results
>
> [write-4096k-seq]
> stonewall
> bs=4096k
> rw=write
> #write_bw_log=sdx-4096k-write-seq.results
> #write_iops_log=sdx-4096k-write-seq.results
>
> [randwrite-4096k-seq]
> stonewall
> bs=4096k
> rw=randwrite
> #write_bw_log=sdx-4096k-randwrite-seq.results
> #write_iops_log=sdx-4096k-randwrite-seq.results
>
> [read-4096k-seq]
> stonewall
> bs=4096k
> rw=read
> #write_bw_log=sdx-4096k-read-seq.results
> #write_iops_log=sdx-4096k-read-seq.results
>
> [randread-4096k-seq]
> stonewall
> bs=4096k
> rw=randread
> #write_bw_log=sdx-4096k-randread-seq.results
> #write_iops_log=sdx-4096k-randread-seq.results
>
> [rw-4096k-seq]
> stonewall
> bs=4096k
> rw=rw
> #write_bw_log=sdx-4096k-rw-seq.results
> #write_iops_log=sdx-4096k-rw-seq.results
>
> [randrw-4096k-seq]
> stonewall
> bs=4096k
> rw=randrw
> #write_bw_log=sdx-4096k-randrw-seq.results
> #write_iops_log=sdx-4096k-randrw-seq.results
>
>
>
>
>
>
> -----Original Message-----
> From: Derrick Lin [mailto:klin938@xxxxxxxxx]
> Sent: dinsdag 9 juni 2020 4:12
> To: Mark Nelson
> Cc: ceph-users@xxxxxxx
> Subject:  Re: poor cephFS performance on Nautilus 14.2.9
> deployed by ceph_ansible
>
> Thanks Mark & Marc
>
> We will do more testing inc kernel client as well as testing the block
> storage performance first.
>
> We just did some direct raw performance test on a single spinning disk
> (format as ext4) and it could delivery 200-300MB/s throughput in various
> writing and mix testings. But FUSE client could only give ~50MB/s.
>
> Cheers,
> D
>
> On Thu, Jun 4, 2020 at 1:27 PM Mark Nelson <mnelson@xxxxxxxxxx> wrote:
>
> > Try using the kernel client instead of the FUSE client.  The FUSE
> > client is known to be slow for a variety of reasons and I suspect you
> > may see faster performance with the kernel client.
> >
> >
> > Thanks,
> >
> > Mark
> >
> >
> > On 6/2/20 8:00 PM, Derrick Lin wrote:
> > > Hi guys,
> > >
> > > We just deployed a CEPH 14.2.9 cluster with the following hardware:
> > >
> > > MDSS x 1
> > > Xeon Gold 5122 3.6Ghz
> > > 192GB
> > > Mellanox ConnectX-4 Lx 25GbE
> > >
> > >
> > > MON x 3
> > > Xeon Bronze 3103 1.7Ghz
> > > 48GB
> > > Mellanox ConnectX-4 Lx 25GbE
> > > 6 x 600GB 10K SAS
> > >
> > > OSD x 5
> > > Xeon Silver 4110 2.1Ghz x 2
> > > 192GB
> > > Mellanox ConnectX-4 Lx 25GbE
> > > 16 x 10TB 7.2K NLSAS (block)
> > > 2 x 2TB Intel P4600 NVMe (block.db)
> > >
> > > Network is all Mellanox SN2410/SN2700 configured at 25GbE for both
> > > front and back network.
> > >
> > > Just for POC at this stage, the cluster was deployed by ceph_ansible
>
> > > without much customization and the initial test on its cephFS FUSE
> > > mount performance seems to be very low. We did some test with iozone
>
> > > the result as follow:
> > >
> > > ]# /opt/iozone/bin/iozone -i 0 -i 1-r 128k -s 5G -t 20
> > >          Iozone: Performance Test of File I/O
> > >                  Version $Revision: 3.465 $
> > >                  Compiled for 64 bit mode.
> > >                  Build: linux-AMD64
> > >
> > >          Contributors:William Norcott, Don Capps, Isom Crawford,
> > > Kirby Collins
> > >                       Al Slater, Scott Rhine, Mike Wisner, Ken Goss
> > >                       Steve Landherr, Brad Smith, Mark Kelly, Dr.
> > > Alain
> > CYR,
> > >                       Randy Dunlap, Mark Montague, Dan Million,
> > > Gavin Brebner,
> > >                       Jean-Marc Zucconi, Jeff Blomberg, Benny
> > > Halevy,
> > Dave
> > > Boone,
> > >                       Erik Habbinga, Kris Strecker, Walter Wong,
> > > Joshua
> > Root,
> > >                       Fabrice Bacchella, Zhenghua Xue, Qin Li,
> > > Darren
> > Sawyer,
> > >                       Vangel Bojaxhi, Ben England, Vikentsi Lapa,
> > >                       Alexey Skidanov.
> > >
> > >          Run began: Tue Jun  2 16:40:53 2020
> > >
> > >          File size set to 5242880 kB
> > >          Command line used: /opt/iozone/bin/iozone -i 0 -i 1-r -s 5G
>
> > > -t
> > 20
> > > 128k
> > >          Output is in kBytes/sec
> > >          Time Resolution = 0.000001 seconds.
> > >          Processor cache size set to 1024 kBytes.
> > >          Processor cache line size set to 32 bytes.
> > >          File stride size set to 17 * record size.
> > >          Throughput test with 20 processes
> > >          Each process writes a 5242880 kByte file in 4 kByte records
> > >
> > >          Children see throughput for 20 initial writers  =
> 35001.12
> > kB/sec
> > >          Parent sees throughput for 20 initial writers   =
> 34967.65
> > kB/sec
> > >          Min throughput per process                      =
> 1748.22
> > kB/sec
> > >          Max throughput per process                      =
> 1751.62
> > kB/sec
> > >          Avg throughput per process                      =
> 1750.06
> > kB/sec
> > >          Min xfer                                        =
> 5232724.00 kB
> > >
> > >          Children see throughput for 20 rewriters        =
> 35704.79
> > kB/sec
> > >          Parent sees throughput for 20 rewriters         =
> 35704.30
> > kB/sec
> > >          Min throughput per process                      =
> 1783.44
> > kB/sec
> > >          Max throughput per process                      =
> 1786.29
> > kB/sec
> > >          Avg throughput per process                      =
> 1785.24
> > kB/sec
> > >          Min xfer                                        =
> 5234532.00 kB
> > >
> > >          Children see throughput for 20 readers          =
> 49368539.50
> > kB/sec
> > >          Parent sees throughput for 20 readers           =
> 49317231.38
> > kB/sec
> > >          Min throughput per process                      =
> 2414424.00
> > kB/sec
> > >          Max throughput per process                      =
> 2599996.25
> > kB/sec
> > >          Avg throughput per process                      =
> 2468426.98
> > kB/sec
> > >          Min xfer                                        =
> 4868708.00 kB
> > >
> > >          Children see throughput for 20 re-readers       =
> 48675891.50
> > kB/sec
> > >          Parent sees throughput for 20 re-readers        =
> 48617335.67
> > kB/sec
> > >          Min throughput per process                      =
> 2316395.25
> > kB/sec
> > >          Max throughput per process                      =
> 2703868.75
> > kB/sec
> > >          Avg throughput per process                      =
> 2433794.58
> > kB/sec
> > >          Min xfer                                        =
> 4491704.00 kB
> > >
> > > We also did some dd tests, the write speed on a single test on our
> > standard
> > > server is ~50MB/s but on a very big memory server, the speed is
> > > double ~ 80-90MB/s.
> > >
> > > We have zero experience on ceph and as said we haven't done more
> > > tuning
> > at
> > > this stage. But if this sort of performance is way too low from
> > > those hardware spec?
> > >
> > > Any hints will be appreciated.
> > >
> > > Cheers
> > > D
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
>
> > > email to ceph-users-leave@xxxxxxx
> > >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> > email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux