Re: Ceph cluster with SSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Please don't remove the ML.
I'm not a support channel and if I reply to mails it is so that
others hopefully will learn from that. 
ML re-added.

On Mon, 11 Sep 2017 16:30:18 +0530 M Ranga Swami Reddy wrote:

> >>> >> Here I have NVMes from Intel. but as the support of these NVMes not
> >>> >> there from Intel, we decided not to use these NVMes as a journal.  
> >>> >
> >>> > You again fail to provide with specific model numbers...  
> >>>
> >>> NEMe - Intel DC P3608  - 1.6TB  
> >>
> >> 3DWPD, so you could put this in front (journal~ of 30 or so of those
> >> Samsungs and it still would last longer.  
> >
> >
> > Sure, I will try this and update the results.
> >
> > Btw,  the "osd bench" showing very number with these SSD based (as
> > compared with HDD baed) OSDs). For ex: HDD based OSDs showing around
> > 500 MB/s and SSD based OSDs showing < 300 MB/s. Its strage to see this
> > results.
> > Any thing do I miss here?  
> 
OSD bench still is the wrong test tool for pretty much anything.
There are no HDDs that write 500MB/s.
So this is either a RAID or something behind a controller with HW cache,
not the 140MB/s or so I'd expect to see with a directly connected HDD.
OSD bench also only writes 1GB by default, something that's easily cached
in such a setup.

The 300MB/s for your EVO SSDs could be the result of how OSD bench works
(sync writes, does it use the journal?) or something simple and silly as
these SSDs hooked up to SATA-2 (3Gb/s aka 300MB/s) ports.

> 
> After adding the NVMe drives ad journal, I could see the osd bench
> improved results (showing > 600 MB/s (without NVMe < 300MB/s)..
> 
What exactly did you do?
How many journals per NVMe?
6 or 7 of the EVOs (at 300 MB/s) will saturate the P3608 in a bandwidth
test. 
4 of the EVOs if their 500MB/s write speed can be achieved.

And since you never mentioned any other details, your cluster could also
be network or CPU bound for all we know.

> But  volumes created from SSD pool, not showing any performance
> improvements (like dd o/p, fio, rbd map, rados bench etc)..

If you're using fio with the rbd io module, I found it to be horribly
buggy.
Best real life test is a fio with a VM.
And to test for IOPS (4k ones), bandwidth is most likely NOT what you will
lack in production. 

> Do I miss any ceph config. setting to above good performacne numbers?
>
Not likely, no.
 
> Thanks
> Swami
> 
> 
> >>> > No support from Intel suggests that these may be consumer models again.
> >>> >
> >>> > Samsung also makes DC grade SSDs and NVMEs, as Adrian pointed out.
> >>> >  
> >>> >> Btw, if we split this SSD with multiple OSD (for ex: 1 SSD with 4 or 2
> >>> >> OSDs), is  this help any performance numbers?
> >>> >>  
> >>> > Of course not, if anything it will make it worse due to the overhead
> >>> > outside the SSD itself.
> >>> >
> >>> > Christian
> >>> >  
> >>> >> On Sun, Aug 20, 2017 at 9:33 AM, Christian Balzer <chibi@xxxxxxx> wrote:  
> >>> >> >
> >>> >> > Hello,
> >>> >> >
> >>> >> > On Sat, 19 Aug 2017 23:22:11 +0530 M Ranga Swami Reddy wrote:
> >>> >> >  
> >>> >> >> SSD make details : SSD 850 EVO 2.5" SATA III 4TB Memory & Storage -
> >>> >> >> MZ-75E4T0B/AM | Samsung
> >>> >> >>  
> >>> >> > And there's your answer.
> >>> >> >
> >>> >> > A bit of googling in the archives here would have shown you that these are
> >>> >> > TOTALLY unsuitable for use with Ceph.
> >>> >> > Not only because of the horrid speed when used with/for Ceph journaling
> >>> >> > (direct/sync I/O) but also their abysmal endurance of 0.04 DWPD over 5
> >>> >> > years.
> >>> >> > Or in other words 160GB/day, which after the Ceph journal double writes
> >>> >> > and FS journals, other overhead and write amplification in general
> >>> >> > probably means less that effective 40GB/day.
> >>> >> >
> >>> >> > In contrast the lowest endurance DC grade SSDs tend to be 0.3 DWPD and
> >>> >> > more commonly 1 DWPD.
> >>> >> > And I'm not buying anything below 3 DWPD for use with Ceph.
> >>> >> >
> >>> >> > Your only chance to improve the speed here is to take the journals off
> >>> >> > them and put them onto fast and durable enough NVMes like the Intel DC P
> >>> >> > 3700 or at worst 3600 types.
> >>> >> >
> >>> >> > That still leaves you with their crappy endurance, only twice as high than
> >>> >> > before with the journals offloaded.
> >>> >> >
> >>> >> > Christian
> >>> >> >  
> >>> >> >> On Sat, Aug 19, 2017 at 10:44 PM, M Ranga Swami Reddy
> >>> >> >> <swamireddy@xxxxxxxxx> wrote:  
> >>> >> >> > Yes, Its in production and used the pg count as per the pg calcuator @ ceph.com.
> >>> >> >> >
> >>> >> >> > On Fri, Aug 18, 2017 at 3:30 AM, Mehmet <ceph@xxxxxxxxxx> wrote:  
> >>> >> >> >> Which ssds are used? Are they in production? If so how is your PG Count?
> >>> >> >> >>
> >>> >> >> >> Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy
> >>> >> >> >> <swamireddy@xxxxxxxxx>:  
> >>> >> >> >>>
> >>> >> >> >>> Hello,
> >>> >> >> >>> I am using the Ceph cluster with HDDs and SSDs. Created separate pool for
> >>> >> >> >>> each.
> >>> >> >> >>> Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 MB/s
> >>> >> >> >>> and SSD's OSD show around 280MB/s.
> >>> >> >> >>>
> >>> >> >> >>> Ideally, what I expected was - SSD's OSDs should be at-least 40% high
> >>> >> >> >>> as compared with HDD's OSD bench.
> >>> >> >> >>>
> >>> >> >> >>> Did I miss anything here? Any hint is appreciated.
> >>> >> >> >>>
> >>> >> >> >>> Thanks
> >>> >> >> >>> Swami
> >>> >> >> >>> ________________________________
> >>> >> >> >>>
> >>> >> >> >>> ceph-users mailing list
> >>> >> >> >>> ceph-users@xxxxxxxxxxxxxx
> >>> >> >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> _______________________________________________
> >>> >> >> >> ceph-users mailing list
> >>> >> >> >> ceph-users@xxxxxxxxxxxxxx
> >>> >> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> >> >> >>  
> >>> >> >> _______________________________________________
> >>> >> >> ceph-users mailing list
> >>> >> >> ceph-users@xxxxxxxxxxxxxx
> >>> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> >> >>  
> >>> >> >
> >>> >> >
> >>> >> > --
> >>> >> > Christian Balzer        Network/Systems Engineer
> >>> >> > chibi@xxxxxxx           Rakuten Communications  
> >>> >>  
> >>> >
> >>> >
> >>> > --
> >>> > Christian Balzer        Network/Systems Engineer
> >>> > chibi@xxxxxxx           Rakuten Communications  
> >>>  
> >>
> >>
> >> --
> >> Christian Balzer        Network/Systems Engineer
> >> chibi@xxxxxxx           Rakuten Communications  
> 


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux