Re: BAD nvme SSD performance

Christian Balzer <chibi@xxxxxxx> · Tue, 27 Oct 2015 00:23:04 +0900

Hello,

On Mon, 26 Oct 2015 14:35:19 +0100 Wido den Hollander wrote:

> 
> 
> On 26-10-15 14:29, Matteo Dacrema wrote:
> > Hi Nick,
> > 
> >  
> > 
> > I also tried to increase iodepth but nothing has changed.
> > 
> >  
> > 
> > With iostat I noticed that the disk is fully utilized and write per
> > seconds from iostat match fio output.
> > 
> 
> Ceph isn't fully optimized to get the maximum potential out of NVME SSDs
> yet.
> 
Indeed. Don't expect Ceph to be near raw SSD performance. 

However he writes that per iostat the SSD is fully utilized.

Matteo, can you run run atop instead of iostat and confirm that:

a) utilization of the SSD is 100%.
b) CPU is not the bottleneck. 

My guess would be these particular NVMe SSDs might just suffer from the
same direct sync I/O deficiencies as other Samsung SSDs.
This feeling is re-affirmed by seeing Samsung listing them as a Client
SSDs, not data center one.
http://www.samsung.com/semiconductor/products/flash-storage/client-ssd/MZHPV256HDGL?ia=831

Regards,

Christian

> For example, NVM-E SSDs work best with very high queue depths and
> parallel IOps.
> 
> Also, be aware that Ceph add multiple layers to the whole I/O subsystem
> and that there will be a performance impact when Ceph is used in between.
> 
> Wido
> 
> >  
> > 
> > Matteo
> > 
> >  
> > 
> > *From:*Nick Fisk [mailto:nick@xxxxxxxxxx]
> > *Sent:* lunedì 26 ottobre 2015 13:06
> > *To:* Matteo Dacrema <mdacrema@xxxxxxxx>; ceph-users@xxxxxxxx
> > *Subject:* RE: BAD nvme SSD performance
> > 
> >  
> > 
> > Hi Matteo,
> > 
> >  
> > 
> > Ceph introduces latency into the write path and so what you are seeing
> > is typical. If you increase the iodepth of the fio test you should get
> > higher results though, until you start maxing out your CPU.
> > 
> >  
> > 
> > Nick
> > 
> >  
> > 
> > *From:*ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] *On Behalf
> > Of *Matteo Dacrema
> > *Sent:* 26 October 2015 11:20
> > *To:* ceph-users@xxxxxxxx <mailto:ceph-users@xxxxxxxx>
> > *Subject:*  BAD nvme SSD performance
> > 
> >  
> > 
> > Hi all,
> > 
> >  
> > 
> > I’ve recently buy two Samsung SM951 256GB nvme PCIe SSDs and built a 2
> > OSD ceph cluster with min_size = 1.
> > 
> > I’ve tested them with fio ad I obtained two very different results with
> > these two situations with fio.
> > 
> > This is the command : *fio  --ioengine=libaio --direct=1  --name=test
> > --filename=test --bs=4k  --size=100M --readwrite=randwrite 
> > --numjobs=200  --group_reporting*
> > 
> >  
> > 
> > On the OSD host I’ve obtained this result:
> > 
> > *bw=575493KB/s, iops=143873*
> > 
> > * *
> > 
> > On the client host with a mounted volume I’ve obtained this result:
> > 
> >  
> > 
> > Fio executed on the client osd with a mounted volume:
> > 
> > *bw=9288.1KB/s, iops=2322*
> > 
> > * *
> > 
> > I’ve obtained this results with Journal and data on the same disk and
> > also with Journal on separate SSD.
> > 
> > * *
> > 
> > I’ve two OSD host with 64GB of RAM and 2x Intel Xeon E5-2620 @ 2.00GHz
> > and one MON host with 128GB of RAM and 2x Intel Xeon E5-2620 @ 2.00
> > GHz.
> > 
> > I’m using 10G mellanox NIC and Switch with jumbo frames.
> > 
> >  
> > 
> > I also did other test with this configuration ( see attached Excel
> > workbook )
> > 
> > Hardware configuration for each of the two OSD nodes:
> > 
> >                 3x  100GB Intel SSD DC S3700 with 3 * 30 GB partition
> > for every SSD
> > 
> >                 9x  1TB Seagate HDD
> > 
> > Results: about *12k* IOPS with 4k bs and same fio test.
> > 
> >  
> > 
> > I can’t understand where is the problem with nvme SSDs.
> > 
> > Anyone can help me?
> > 
> >  
> > 
> > Here the *ceph.conf:*
> > 
> > [global]
> > 
> > fsid = 3392a053-7b48-49d3-8fc9-50f245513cc7
> > 
> > mon_initial_members = mon1
> > 
> > mon_host = 192.168.1.3
> > 
> > auth_cluster_required = cephx
> > 
> > auth_service_required = cephx
> > 
> > auth_client_required = cephx
> > 
> > osd_pool_default_size = 2
> > 
> > mon_client_hung_interval = 1.0
> > 
> > mon_client_ping_interval = 5.0
> > 
> > public_network = 192.168.1.0/24
> > 
> > cluster_network = 192.168.1.0/24
> > 
> > mon_osd_full_ratio = .90
> > 
> > mon_osd_nearfull_ratio = .85
> > 
> >  
> > 
> > [mon]
> > 
> > mon_warn_on_legacy_crush_tunables = false
> > 
> >  
> > 
> > [mon.1]
> > 
> > host = mon1
> > 
> > mon_addr = 192.168.1.3:6789
> > 
> >  
> > 
> > [osd]
> > 
> > osd_journal_size = 30000
> > 
> > journal_dio = true
> > 
> > journal_aio = true
> > 
> > osd_op_threads = 24
> > 
> > osd_op_thread_timeout = 60
> > 
> > osd_disk_threads = 8
> > 
> > osd_recovery_threads = 2
> > 
> > osd_recovery_max_active = 1
> > 
> > osd_max_backfills = 2
> > 
> > osd_mkfs_type = xfs
> > 
> > osd_mkfs_options_xfs = "-f -i size=2048"
> > 
> > osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k,delaylog"
> > 
> > filestore_xattr_use_omap = false
> > 
> > filestore_max_inline_xattr_size = 512
> > 
> > filestore_max_sync_interval = 10
> > 
> > filestore_merge_threshold = 40
> > 
> > filestore_split_multiple = 8
> > 
> > filestore_flusher = false
> > 
> > filestore_queue_max_ops = 2000
> > 
> > filestore_queue_max_bytes = 536870912
> > 
> > filestore_queue_committing_max_ops = 500
> > 
> > filestore_queue_committing_max_bytes = 268435456
> > 
> > filestore_op_threads = 2
> > 
> >  
> > 
> > Best regards,
> > 
> > Matteo
> > 
> >  
> > 
> > 
> > Web Bug from http://xo4t.mj.am/o/xo4t/f8b6cd3d/qoi1l59e.gif
> > -- 
> > Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non
> > infetto.
> > Clicca qui per segnalarlo come spam.
> > <http://esva01.enter.it/cgi-bin/learn-msg.cgi?id=326E9400C6.A1DC9>
> > Clicca qui per metterlo in blacklist
> > <http://esva01.enter.it/cgi-bin/learn-msg.cgi?blacklist=1&id=326E9400C6.A1DC9>
> > 
> > 
> > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com