Re: poor cephFS performance on Nautilus 14.2.9 deployed by ceph_ansible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have encountered an issue on clients hanging on by opening a file.
Besides, any other client
who visited the same file or directory hung as well. The only way to
resolve it is rebooting the
clients server. This happened on kernel client only, Luminous version.
After that I chose fuse client except the client has large performance need.

Derrick Lin <klin938@xxxxxxxxx> 于2020年6月15日周一 下午3:28写道:

> Hi guys,
>
> I tried to mount via kernel driver, it works beautifully. I was surprised,
> below is one of the FIO test, which wasn't able to run at all in FUSE
> mount:
>
> # /usr/bin/fio --randrepeat=1 --ioengine=libaio --direct=1
> --gtod_reduce=1 --name=FIO --filename=fio.test --bs=4M --iodepth=16
> --size=50G --readwrite=randrw --rwmixread=75
> FIO: (g=0): rw=randrw, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB,
> (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=16
> fio-3.7
> Starting 1 process
> FIO: Laying out IO file (1 file / 51200MiB)
> Jobs: 1 (f=1): [m(1)][100.0%][r=1021MiB/s,w=340MiB/s][r=255,w=85
> IOPS][eta 00m:00s]
> FIO: (groupid=0, jobs=1): err= 0: pid=131431: Thu Jun 11 17:13:22 2020
>    read: IOPS=249, BW=999MiB/s (1047MB/s)(37.5GiB/38408msec)
>    bw (  KiB/s): min=819200, max=1171456, per=100.00%, avg=1023387.46,
> stdev=69360.06, samples=76
>    iops        : min=  200, max=  286, avg=249.83, stdev=16.96, samples=76
>   write: IOPS=83, BW=334MiB/s (351MB/s)(12.5GiB/38408msec)
>    bw (  KiB/s): min=229376, max=475136, per=99.96%, avg=342204.45,
> stdev=40407.55, samples=76
>    iops        : min=   56, max=  116, avg=83.51, stdev= 9.87, samples=76
>   cpu          : usr=1.56%, sys=4.44%, ctx=12050, majf=0, minf=24
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.9%, 32=0.0%,
> >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>      issued rwts: total=9590,3210,0,0 short=0,0,0,0 dropped=0,0,0,0
>      latency   : target=0, window=0, percentile=100.00%, depth=16
>
> Run status group 0 (all jobs):
>    READ: bw=999MiB/s (1047MB/s), 999MiB/s-999MiB/s
> (1047MB/s-1047MB/s), io=37.5GiB (40.2GB), run=38408-38408msec
>   WRITE: bw=334MiB/s (351MB/s), 334MiB/s-334MiB/s (351MB/s-351MB/s),
> io=12.5GiB (13.5GB), run=38408-38408msec
>
>
>
>
> On Tue, Jun 9, 2020 at 6:16 PM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote:
>
> >
> > Hi Derrick,
> >
> > I am not sure what this 200-300MB/s on hdd is. But it is probably not
> > really relevant. I am testing native disk performance before I use them
> > with ceph with this fio script. It is a bit lengthy, that is because I
> > want to be able to have data for possible future use cases.
> >
> > Furthermore since I upgraded to Nautilus I have been having issues with
> > the kernel mount cephfs on osd nodes and had to revert back to fuse.
> > Even when having 88GB free memory.
> >
> > https://tracker.ceph.com/issues/45663
> > https://tracker.ceph.com/issues/44100
> >
> >
> > [global]
> > ioengine=libaio
> > #ioengine=posixaio
> > invalidate=1
> > ramp_time=30
> > iodepth=1
> > runtime=180
> > time_based
> > direct=1
> > filename=/dev/sdX
> > #filename=/mnt/cephfs/ssd/fio-bench.img
> >
> > [write-4k-seq]
> > stonewall
> > bs=4k
> > rw=write
> > #write_bw_log=sdx-4k-write-seq.results
> > #write_iops_log=sdx-4k-write-seq.results
> >
> > [randwrite-4k-seq]
> > stonewall
> > bs=4k
> > rw=randwrite
> > #write_bw_log=sdx-4k-randwrite-seq.results
> > #write_iops_log=sdx-4k-randwrite-seq.results
> >
> > [read-4k-seq]
> > stonewall
> > bs=4k
> > rw=read
> > #write_bw_log=sdx-4k-read-seq.results
> > #write_iops_log=sdx-4k-read-seq.results
> >
> > [randread-4k-seq]
> > stonewall
> > bs=4k
> > rw=randread
> > #write_bw_log=sdx-4k-randread-seq.results
> > #write_iops_log=sdx-4k-randread-seq.results
> >
> > [rw-4k-seq]
> > stonewall
> > bs=4k
> > rw=rw
> > #write_bw_log=sdx-4k-rw-seq.results
> > #write_iops_log=sdx-4k-rw-seq.results
> >
> > [randrw-4k-seq]
> > stonewall
> > bs=4k
> > rw=randrw
> > #write_bw_log=sdx-4k-randrw-seq.results
> > #write_iops_log=sdx-4k-randrw-seq.results
> >
> > [write-128k-seq]
> > stonewall
> > bs=128k
> > rw=write
> > #write_bw_log=sdx-128k-write-seq.results
> > #write_iops_log=sdx-128k-write-seq.results
> >
> > [randwrite-128k-seq]
> > stonewall
> > bs=128k
> > rw=randwrite
> > #write_bw_log=sdx-128k-randwrite-seq.results
> > #write_iops_log=sdx-128k-randwrite-seq.results
> >
> > [read-128k-seq]
> > stonewall
> > bs=128k
> > rw=read
> > #write_bw_log=sdx-128k-read-seq.results
> > #write_iops_log=sdx-128k-read-seq.results
> >
> > [randread-128k-seq]
> > stonewall
> > bs=128k
> > rw=randread
> > #write_bw_log=sdx-128k-randread-seq.results
> > #write_iops_log=sdx-128k-randread-seq.results
> >
> > [rw-128k-seq]
> > stonewall
> > bs=128k
> > rw=rw
> > #write_bw_log=sdx-128k-rw-seq.results
> > #write_iops_log=sdx-128k-rw-seq.results
> >
> > [randrw-128k-seq]
> > stonewall
> > bs=128k
> > rw=randrw
> > #write_bw_log=sdx-128k-randrw-seq.results
> > #write_iops_log=sdx-128k-randrw-seq.results
> >
> > [write-1024k-seq]
> > stonewall
> > bs=1024k
> > rw=write
> > #write_bw_log=sdx-1024k-write-seq.results
> > #write_iops_log=sdx-1024k-write-seq.results
> >
> > [randwrite-1024k-seq]
> > stonewall
> > bs=1024k
> > rw=randwrite
> > #write_bw_log=sdx-1024k-randwrite-seq.results
> > #write_iops_log=sdx-1024k-randwrite-seq.results
> >
> > [read-1024k-seq]
> > stonewall
> > bs=1024k
> > rw=read
> > #write_bw_log=sdx-1024k-read-seq.results
> > #write_iops_log=sdx-1024k-read-seq.results
> >
> > [randread-1024k-seq]
> > stonewall
> > bs=1024k
> > rw=randread
> > #write_bw_log=sdx-1024k-randread-seq.results
> > #write_iops_log=sdx-1024k-randread-seq.results
> >
> > [rw-1024k-seq]
> > stonewall
> > bs=1024k
> > rw=rw
> > #write_bw_log=sdx-1024k-rw-seq.results
> > #write_iops_log=sdx-1024k-rw-seq.results
> >
> > [randrw-1024k-seq]
> > stonewall
> > bs=1024k
> > rw=randrw
> > #write_bw_log=sdx-1024k-randrw-seq.results
> > #write_iops_log=sdx-1024k-randrw-seq.results
> >
> > [write-4096k-seq]
> > stonewall
> > bs=4096k
> > rw=write
> > #write_bw_log=sdx-4096k-write-seq.results
> > #write_iops_log=sdx-4096k-write-seq.results
> >
> > [randwrite-4096k-seq]
> > stonewall
> > bs=4096k
> > rw=randwrite
> > #write_bw_log=sdx-4096k-randwrite-seq.results
> > #write_iops_log=sdx-4096k-randwrite-seq.results
> >
> > [read-4096k-seq]
> > stonewall
> > bs=4096k
> > rw=read
> > #write_bw_log=sdx-4096k-read-seq.results
> > #write_iops_log=sdx-4096k-read-seq.results
> >
> > [randread-4096k-seq]
> > stonewall
> > bs=4096k
> > rw=randread
> > #write_bw_log=sdx-4096k-randread-seq.results
> > #write_iops_log=sdx-4096k-randread-seq.results
> >
> > [rw-4096k-seq]
> > stonewall
> > bs=4096k
> > rw=rw
> > #write_bw_log=sdx-4096k-rw-seq.results
> > #write_iops_log=sdx-4096k-rw-seq.results
> >
> > [randrw-4096k-seq]
> > stonewall
> > bs=4096k
> > rw=randrw
> > #write_bw_log=sdx-4096k-randrw-seq.results
> > #write_iops_log=sdx-4096k-randrw-seq.results
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Derrick Lin [mailto:klin938@xxxxxxxxx]
> > Sent: dinsdag 9 juni 2020 4:12
> > To: Mark Nelson
> > Cc: ceph-users@xxxxxxx
> > Subject:  Re: poor cephFS performance on Nautilus 14.2.9
> > deployed by ceph_ansible
> >
> > Thanks Mark & Marc
> >
> > We will do more testing inc kernel client as well as testing the block
> > storage performance first.
> >
> > We just did some direct raw performance test on a single spinning disk
> > (format as ext4) and it could delivery 200-300MB/s throughput in various
> > writing and mix testings. But FUSE client could only give ~50MB/s.
> >
> > Cheers,
> > D
> >
> > On Thu, Jun 4, 2020 at 1:27 PM Mark Nelson <mnelson@xxxxxxxxxx> wrote:
> >
> > > Try using the kernel client instead of the FUSE client.  The FUSE
> > > client is known to be slow for a variety of reasons and I suspect you
> > > may see faster performance with the kernel client.
> > >
> > >
> > > Thanks,
> > >
> > > Mark
> > >
> > >
> > > On 6/2/20 8:00 PM, Derrick Lin wrote:
> > > > Hi guys,
> > > >
> > > > We just deployed a CEPH 14.2.9 cluster with the following hardware:
> > > >
> > > > MDSS x 1
> > > > Xeon Gold 5122 3.6Ghz
> > > > 192GB
> > > > Mellanox ConnectX-4 Lx 25GbE
> > > >
> > > >
> > > > MON x 3
> > > > Xeon Bronze 3103 1.7Ghz
> > > > 48GB
> > > > Mellanox ConnectX-4 Lx 25GbE
> > > > 6 x 600GB 10K SAS
> > > >
> > > > OSD x 5
> > > > Xeon Silver 4110 2.1Ghz x 2
> > > > 192GB
> > > > Mellanox ConnectX-4 Lx 25GbE
> > > > 16 x 10TB 7.2K NLSAS (block)
> > > > 2 x 2TB Intel P4600 NVMe (block.db)
> > > >
> > > > Network is all Mellanox SN2410/SN2700 configured at 25GbE for both
> > > > front and back network.
> > > >
> > > > Just for POC at this stage, the cluster was deployed by ceph_ansible
> >
> > > > without much customization and the initial test on its cephFS FUSE
> > > > mount performance seems to be very low. We did some test with iozone
> >
> > > > the result as follow:
> > > >
> > > > ]# /opt/iozone/bin/iozone -i 0 -i 1-r 128k -s 5G -t 20
> > > >          Iozone: Performance Test of File I/O
> > > >                  Version $Revision: 3.465 $
> > > >                  Compiled for 64 bit mode.
> > > >                  Build: linux-AMD64
> > > >
> > > >          Contributors:William Norcott, Don Capps, Isom Crawford,
> > > > Kirby Collins
> > > >                       Al Slater, Scott Rhine, Mike Wisner, Ken Goss
> > > >                       Steve Landherr, Brad Smith, Mark Kelly, Dr.
> > > > Alain
> > > CYR,
> > > >                       Randy Dunlap, Mark Montague, Dan Million,
> > > > Gavin Brebner,
> > > >                       Jean-Marc Zucconi, Jeff Blomberg, Benny
> > > > Halevy,
> > > Dave
> > > > Boone,
> > > >                       Erik Habbinga, Kris Strecker, Walter Wong,
> > > > Joshua
> > > Root,
> > > >                       Fabrice Bacchella, Zhenghua Xue, Qin Li,
> > > > Darren
> > > Sawyer,
> > > >                       Vangel Bojaxhi, Ben England, Vikentsi Lapa,
> > > >                       Alexey Skidanov.
> > > >
> > > >          Run began: Tue Jun  2 16:40:53 2020
> > > >
> > > >          File size set to 5242880 kB
> > > >          Command line used: /opt/iozone/bin/iozone -i 0 -i 1-r -s 5G
> >
> > > > -t
> > > 20
> > > > 128k
> > > >          Output is in kBytes/sec
> > > >          Time Resolution = 0.000001 seconds.
> > > >          Processor cache size set to 1024 kBytes.
> > > >          Processor cache line size set to 32 bytes.
> > > >          File stride size set to 17 * record size.
> > > >          Throughput test with 20 processes
> > > >          Each process writes a 5242880 kByte file in 4 kByte records
> > > >
> > > >          Children see throughput for 20 initial writers  =
> > 35001.12
> > > kB/sec
> > > >          Parent sees throughput for 20 initial writers   =
> > 34967.65
> > > kB/sec
> > > >          Min throughput per process                      =
> > 1748.22
> > > kB/sec
> > > >          Max throughput per process                      =
> > 1751.62
> > > kB/sec
> > > >          Avg throughput per process                      =
> > 1750.06
> > > kB/sec
> > > >          Min xfer                                        =
> > 5232724.00 kB
> > > >
> > > >          Children see throughput for 20 rewriters        =
> > 35704.79
> > > kB/sec
> > > >          Parent sees throughput for 20 rewriters         =
> > 35704.30
> > > kB/sec
> > > >          Min throughput per process                      =
> > 1783.44
> > > kB/sec
> > > >          Max throughput per process                      =
> > 1786.29
> > > kB/sec
> > > >          Avg throughput per process                      =
> > 1785.24
> > > kB/sec
> > > >          Min xfer                                        =
> > 5234532.00 kB
> > > >
> > > >          Children see throughput for 20 readers          =
> > 49368539.50
> > > kB/sec
> > > >          Parent sees throughput for 20 readers           =
> > 49317231.38
> > > kB/sec
> > > >          Min throughput per process                      =
> > 2414424.00
> > > kB/sec
> > > >          Max throughput per process                      =
> > 2599996.25
> > > kB/sec
> > > >          Avg throughput per process                      =
> > 2468426.98
> > > kB/sec
> > > >          Min xfer                                        =
> > 4868708.00 kB
> > > >
> > > >          Children see throughput for 20 re-readers       =
> > 48675891.50
> > > kB/sec
> > > >          Parent sees throughput for 20 re-readers        =
> > 48617335.67
> > > kB/sec
> > > >          Min throughput per process                      =
> > 2316395.25
> > > kB/sec
> > > >          Max throughput per process                      =
> > 2703868.75
> > > kB/sec
> > > >          Avg throughput per process                      =
> > 2433794.58
> > > kB/sec
> > > >          Min xfer                                        =
> > 4491704.00 kB
> > > >
> > > > We also did some dd tests, the write speed on a single test on our
> > > standard
> > > > server is ~50MB/s but on a very big memory server, the speed is
> > > > double ~ 80-90MB/s.
> > > >
> > > > We have zero experience on ceph and as said we haven't done more
> > > > tuning
> > > at
> > > > this stage. But if this sort of performance is way too low from
> > > > those hardware spec?
> > > >
> > > > Any hints will be appreciated.
> > > >
> > > > Cheers
> > > > D
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> >
> > > > email to ceph-users-leave@xxxxxxx
> > > >
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> > > email to ceph-users-leave@xxxxxxx
> > >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> > email to ceph-users-leave@xxxxxxx
> >
> >
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux