Hi guys, I tried to mount via kernel driver, it works beautifully. I was surprised, below is one of the FIO test, which wasn't able to run at all in FUSE mount: # /usr/bin/fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=FIO --filename=fio.test --bs=4M --iodepth=16 --size=50G --readwrite=randrw --rwmixread=75 FIO: (g=0): rw=randrw, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=16 fio-3.7 Starting 1 process FIO: Laying out IO file (1 file / 51200MiB) Jobs: 1 (f=1): [m(1)][100.0%][r=1021MiB/s,w=340MiB/s][r=255,w=85 IOPS][eta 00m:00s] FIO: (groupid=0, jobs=1): err= 0: pid=131431: Thu Jun 11 17:13:22 2020 read: IOPS=249, BW=999MiB/s (1047MB/s)(37.5GiB/38408msec) bw ( KiB/s): min=819200, max=1171456, per=100.00%, avg=1023387.46, stdev=69360.06, samples=76 iops : min= 200, max= 286, avg=249.83, stdev=16.96, samples=76 write: IOPS=83, BW=334MiB/s (351MB/s)(12.5GiB/38408msec) bw ( KiB/s): min=229376, max=475136, per=99.96%, avg=342204.45, stdev=40407.55, samples=76 iops : min= 56, max= 116, avg=83.51, stdev= 9.87, samples=76 cpu : usr=1.56%, sys=4.44%, ctx=12050, majf=0, minf=24 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.9%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=9590,3210,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=16 Run status group 0 (all jobs): READ: bw=999MiB/s (1047MB/s), 999MiB/s-999MiB/s (1047MB/s-1047MB/s), io=37.5GiB (40.2GB), run=38408-38408msec WRITE: bw=334MiB/s (351MB/s), 334MiB/s-334MiB/s (351MB/s-351MB/s), io=12.5GiB (13.5GB), run=38408-38408msec On Tue, Jun 9, 2020 at 6:16 PM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote: > > Hi Derrick, > > I am not sure what this 200-300MB/s on hdd is. But it is probably not > really relevant. I am testing native disk performance before I use them > with ceph with this fio script. It is a bit lengthy, that is because I > want to be able to have data for possible future use cases. > > Furthermore since I upgraded to Nautilus I have been having issues with > the kernel mount cephfs on osd nodes and had to revert back to fuse. > Even when having 88GB free memory. > > https://tracker.ceph.com/issues/45663 > https://tracker.ceph.com/issues/44100 > > > [global] > ioengine=libaio > #ioengine=posixaio > invalidate=1 > ramp_time=30 > iodepth=1 > runtime=180 > time_based > direct=1 > filename=/dev/sdX > #filename=/mnt/cephfs/ssd/fio-bench.img > > [write-4k-seq] > stonewall > bs=4k > rw=write > #write_bw_log=sdx-4k-write-seq.results > #write_iops_log=sdx-4k-write-seq.results > > [randwrite-4k-seq] > stonewall > bs=4k > rw=randwrite > #write_bw_log=sdx-4k-randwrite-seq.results > #write_iops_log=sdx-4k-randwrite-seq.results > > [read-4k-seq] > stonewall > bs=4k > rw=read > #write_bw_log=sdx-4k-read-seq.results > #write_iops_log=sdx-4k-read-seq.results > > [randread-4k-seq] > stonewall > bs=4k > rw=randread > #write_bw_log=sdx-4k-randread-seq.results > #write_iops_log=sdx-4k-randread-seq.results > > [rw-4k-seq] > stonewall > bs=4k > rw=rw > #write_bw_log=sdx-4k-rw-seq.results > #write_iops_log=sdx-4k-rw-seq.results > > [randrw-4k-seq] > stonewall > bs=4k > rw=randrw > #write_bw_log=sdx-4k-randrw-seq.results > #write_iops_log=sdx-4k-randrw-seq.results > > [write-128k-seq] > stonewall > bs=128k > rw=write > #write_bw_log=sdx-128k-write-seq.results > #write_iops_log=sdx-128k-write-seq.results > > [randwrite-128k-seq] > stonewall > bs=128k > rw=randwrite > #write_bw_log=sdx-128k-randwrite-seq.results > #write_iops_log=sdx-128k-randwrite-seq.results > > [read-128k-seq] > stonewall > bs=128k > rw=read > #write_bw_log=sdx-128k-read-seq.results > #write_iops_log=sdx-128k-read-seq.results > > [randread-128k-seq] > stonewall > bs=128k > rw=randread > #write_bw_log=sdx-128k-randread-seq.results > #write_iops_log=sdx-128k-randread-seq.results > > [rw-128k-seq] > stonewall > bs=128k > rw=rw > #write_bw_log=sdx-128k-rw-seq.results > #write_iops_log=sdx-128k-rw-seq.results > > [randrw-128k-seq] > stonewall > bs=128k > rw=randrw > #write_bw_log=sdx-128k-randrw-seq.results > #write_iops_log=sdx-128k-randrw-seq.results > > [write-1024k-seq] > stonewall > bs=1024k > rw=write > #write_bw_log=sdx-1024k-write-seq.results > #write_iops_log=sdx-1024k-write-seq.results > > [randwrite-1024k-seq] > stonewall > bs=1024k > rw=randwrite > #write_bw_log=sdx-1024k-randwrite-seq.results > #write_iops_log=sdx-1024k-randwrite-seq.results > > [read-1024k-seq] > stonewall > bs=1024k > rw=read > #write_bw_log=sdx-1024k-read-seq.results > #write_iops_log=sdx-1024k-read-seq.results > > [randread-1024k-seq] > stonewall > bs=1024k > rw=randread > #write_bw_log=sdx-1024k-randread-seq.results > #write_iops_log=sdx-1024k-randread-seq.results > > [rw-1024k-seq] > stonewall > bs=1024k > rw=rw > #write_bw_log=sdx-1024k-rw-seq.results > #write_iops_log=sdx-1024k-rw-seq.results > > [randrw-1024k-seq] > stonewall > bs=1024k > rw=randrw > #write_bw_log=sdx-1024k-randrw-seq.results > #write_iops_log=sdx-1024k-randrw-seq.results > > [write-4096k-seq] > stonewall > bs=4096k > rw=write > #write_bw_log=sdx-4096k-write-seq.results > #write_iops_log=sdx-4096k-write-seq.results > > [randwrite-4096k-seq] > stonewall > bs=4096k > rw=randwrite > #write_bw_log=sdx-4096k-randwrite-seq.results > #write_iops_log=sdx-4096k-randwrite-seq.results > > [read-4096k-seq] > stonewall > bs=4096k > rw=read > #write_bw_log=sdx-4096k-read-seq.results > #write_iops_log=sdx-4096k-read-seq.results > > [randread-4096k-seq] > stonewall > bs=4096k > rw=randread > #write_bw_log=sdx-4096k-randread-seq.results > #write_iops_log=sdx-4096k-randread-seq.results > > [rw-4096k-seq] > stonewall > bs=4096k > rw=rw > #write_bw_log=sdx-4096k-rw-seq.results > #write_iops_log=sdx-4096k-rw-seq.results > > [randrw-4096k-seq] > stonewall > bs=4096k > rw=randrw > #write_bw_log=sdx-4096k-randrw-seq.results > #write_iops_log=sdx-4096k-randrw-seq.results > > > > > > > -----Original Message----- > From: Derrick Lin [mailto:klin938@xxxxxxxxx] > Sent: dinsdag 9 juni 2020 4:12 > To: Mark Nelson > Cc: ceph-users@xxxxxxx > Subject: Re: poor cephFS performance on Nautilus 14.2.9 > deployed by ceph_ansible > > Thanks Mark & Marc > > We will do more testing inc kernel client as well as testing the block > storage performance first. > > We just did some direct raw performance test on a single spinning disk > (format as ext4) and it could delivery 200-300MB/s throughput in various > writing and mix testings. But FUSE client could only give ~50MB/s. > > Cheers, > D > > On Thu, Jun 4, 2020 at 1:27 PM Mark Nelson <mnelson@xxxxxxxxxx> wrote: > > > Try using the kernel client instead of the FUSE client. The FUSE > > client is known to be slow for a variety of reasons and I suspect you > > may see faster performance with the kernel client. > > > > > > Thanks, > > > > Mark > > > > > > On 6/2/20 8:00 PM, Derrick Lin wrote: > > > Hi guys, > > > > > > We just deployed a CEPH 14.2.9 cluster with the following hardware: > > > > > > MDSS x 1 > > > Xeon Gold 5122 3.6Ghz > > > 192GB > > > Mellanox ConnectX-4 Lx 25GbE > > > > > > > > > MON x 3 > > > Xeon Bronze 3103 1.7Ghz > > > 48GB > > > Mellanox ConnectX-4 Lx 25GbE > > > 6 x 600GB 10K SAS > > > > > > OSD x 5 > > > Xeon Silver 4110 2.1Ghz x 2 > > > 192GB > > > Mellanox ConnectX-4 Lx 25GbE > > > 16 x 10TB 7.2K NLSAS (block) > > > 2 x 2TB Intel P4600 NVMe (block.db) > > > > > > Network is all Mellanox SN2410/SN2700 configured at 25GbE for both > > > front and back network. > > > > > > Just for POC at this stage, the cluster was deployed by ceph_ansible > > > > without much customization and the initial test on its cephFS FUSE > > > mount performance seems to be very low. We did some test with iozone > > > > the result as follow: > > > > > > ]# /opt/iozone/bin/iozone -i 0 -i 1-r 128k -s 5G -t 20 > > > Iozone: Performance Test of File I/O > > > Version $Revision: 3.465 $ > > > Compiled for 64 bit mode. > > > Build: linux-AMD64 > > > > > > Contributors:William Norcott, Don Capps, Isom Crawford, > > > Kirby Collins > > > Al Slater, Scott Rhine, Mike Wisner, Ken Goss > > > Steve Landherr, Brad Smith, Mark Kelly, Dr. > > > Alain > > CYR, > > > Randy Dunlap, Mark Montague, Dan Million, > > > Gavin Brebner, > > > Jean-Marc Zucconi, Jeff Blomberg, Benny > > > Halevy, > > Dave > > > Boone, > > > Erik Habbinga, Kris Strecker, Walter Wong, > > > Joshua > > Root, > > > Fabrice Bacchella, Zhenghua Xue, Qin Li, > > > Darren > > Sawyer, > > > Vangel Bojaxhi, Ben England, Vikentsi Lapa, > > > Alexey Skidanov. > > > > > > Run began: Tue Jun 2 16:40:53 2020 > > > > > > File size set to 5242880 kB > > > Command line used: /opt/iozone/bin/iozone -i 0 -i 1-r -s 5G > > > > -t > > 20 > > > 128k > > > Output is in kBytes/sec > > > Time Resolution = 0.000001 seconds. > > > Processor cache size set to 1024 kBytes. > > > Processor cache line size set to 32 bytes. > > > File stride size set to 17 * record size. > > > Throughput test with 20 processes > > > Each process writes a 5242880 kByte file in 4 kByte records > > > > > > Children see throughput for 20 initial writers = > 35001.12 > > kB/sec > > > Parent sees throughput for 20 initial writers = > 34967.65 > > kB/sec > > > Min throughput per process = > 1748.22 > > kB/sec > > > Max throughput per process = > 1751.62 > > kB/sec > > > Avg throughput per process = > 1750.06 > > kB/sec > > > Min xfer = > 5232724.00 kB > > > > > > Children see throughput for 20 rewriters = > 35704.79 > > kB/sec > > > Parent sees throughput for 20 rewriters = > 35704.30 > > kB/sec > > > Min throughput per process = > 1783.44 > > kB/sec > > > Max throughput per process = > 1786.29 > > kB/sec > > > Avg throughput per process = > 1785.24 > > kB/sec > > > Min xfer = > 5234532.00 kB > > > > > > Children see throughput for 20 readers = > 49368539.50 > > kB/sec > > > Parent sees throughput for 20 readers = > 49317231.38 > > kB/sec > > > Min throughput per process = > 2414424.00 > > kB/sec > > > Max throughput per process = > 2599996.25 > > kB/sec > > > Avg throughput per process = > 2468426.98 > > kB/sec > > > Min xfer = > 4868708.00 kB > > > > > > Children see throughput for 20 re-readers = > 48675891.50 > > kB/sec > > > Parent sees throughput for 20 re-readers = > 48617335.67 > > kB/sec > > > Min throughput per process = > 2316395.25 > > kB/sec > > > Max throughput per process = > 2703868.75 > > kB/sec > > > Avg throughput per process = > 2433794.58 > > kB/sec > > > Min xfer = > 4491704.00 kB > > > > > > We also did some dd tests, the write speed on a single test on our > > standard > > > server is ~50MB/s but on a very big memory server, the speed is > > > double ~ 80-90MB/s. > > > > > > We have zero experience on ceph and as said we haven't done more > > > tuning > > at > > > this stage. But if this sort of performance is way too low from > > > those hardware spec? > > > > > > Any hints will be appreciated. > > > > > > Cheers > > > D > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > > > > email to ceph-users-leave@xxxxxxx > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > > email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx