Re: Fwd: [Ceph-community] Wasting the Storage capacity when using Ceph based On high-end storage systems

Nick Fisk <nick@xxxxxxxxxx> · Tue, 31 May 2016 08:41:33 +0100

Hi Oliver,

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Oliver Dzombic
> Sent: 30 May 2016 16:32
> To: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Fwd: [Ceph-community] Wasting the Storage
> capacity when using Ceph based On high-end storage systems
> 
> Hi,
> 
> E3 CPUs have 4 Cores, with HT Unit. So 8 logical Cores. And they are not
multi
> CPU.
> 
> That means you will be naturally ( fastly ) limited in the number of OSD's
you
> can run with that.

I'm hoping to be able to run 12, do you think that will be a struggle?

> 
> Because no matter how much Ghz it has, the OSD process occupy a cpu core
> for ever. 

I'm not sure I agree with this point. A OSD process is comprised of 10's of
threads, which unless you have pinned the process to a single core, will be
running randomly across all the cores on the CPU. As far as I'm aware, all
these threads are given a 10ms time slice and then scheduled to run on the
next available core. A 4x4Ghz CPU will run all these threads faster than a
8x2Ghz CPU, this is where the latency advantages are seen.

If you get to the point you have 100's of threads all demanding CPU time, a
4x4Ghz CPU will be roughly the same speed as a 8x2Ghz CPU. Yes there are
half the cores available, but each core completes its work in half the time.
There may be some advantages with ever increasing thread counts, but there
is also disadvantages with memory/IO access over the inter CPU link in the
case of dual sockets.

> Not for 100%, but still enough, to ruin ur day, if you have 8 logical
> cores and 12 disks ( in scrubbing/backfilling/high load ).

I did some testing with a 12 Core 2Ghz Xeon E5 (2x6) by disabling 8 cores
and performance was sufficient. I know E3 and E5 are different CPU families,
but hopefully this was a good enough test.

> 
> So all single Core CPU's are just good for a very limited amount of OSD's.

> --
> Mit freundlichen Gruessen / Best regards
> 
> Oliver Dzombic
> IP-Interactive
> 
> mailto:info@xxxxxxxxxxxxxxxxx
> 
> Anschrift:
> 
> IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3
> 63571 Gelnhausen
> 
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
> 
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
> 
> 
> Am 30.05.2016 um 17:13 schrieb Christian Balzer:
> >
> > Hello,
> >
> > On Mon, 30 May 2016 09:40:11 +0100 Nick Fisk wrote:
> >
> >> The other option is to scale out rather than scale up. I'm currently
> >> building nodes based on a fast Xeon E3 with 12 Drives in 1U. The
> >> MB/CPU is very attractively priced and the higher clock gives you
> >> much lower write latency if that is important. The density is
> >> slightly lower, but I guess you gain an advantage in more granularity
of the
> cluster.
> >>
> > Most definitely, granularity and number of OSDs (up to a point, mind
> > ya) is a good thing [TM].
> >
> > I was citing the designs I did to basically counter the "not dense
enough"
> > argument.
> >
> > Ultimately with Ceph (unless you throw lots of money and brain cells
> > at it), the less dense, the better it will perform.
> >
> > Christian
> >>> -----Original Message-----
> >>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
> >>> Behalf Of Jack Makenz
> >>> Sent: 30 May 2016 08:40
> >>> To: Christian Balzer <chibi@xxxxxxx>
> >>> Cc: ceph-users@xxxxxxxxxxxxxx
> >>> Subject: Re:  Fwd: [Ceph-community] Wasting the Storage
> >>> capacity when using Ceph based On high-end storage systems
> >>>
> >>> Thanks Christian, and all of ceph users
> >>>
> >>> Your guidance was very helpful, appreciate !
> >>>
> >>> Regards
> >>> Jack Makenz
> >>>
> >>> On Mon, May 30, 2016 at 11:08 AM, Christian Balzer <chibi@xxxxxxx>
> >>> wrote:
> >>>
> >>> Hello,
> >>>
> >>> you may want to read up on the various high-density node threads and
> >>> conversations here.
> >>>
> >>> You most certainly do NOT need high end-storage systems to create
> >>> multi-petabyte storage systems with Ceph.
> >>>
> >>> If you were to use these chassis as a basis:
> >>>
> >>> https://www.supermicro.com.tw/products/system/4U/6048/SSG-
> 6048R-
> >>> E1CR60N.cfm
> >>> [We (and surely others) urged Supermicro to provide a design like
> >>> this]
> >>>
> >>> And fill them with 6TB HDDs, configure them as 5x 12HDD RAID6s, set
> >>> your replication to 2 in Ceph, you will wind up with VERY reliable,
> >>> resilient 1.2PB per rack (32U, leaving space for other bits and not
> >>> melting the PDUs).
> >>> Add fast SSDs or NVMes to this case for journals and you have
> >>> decently performing mass storage.
> >>>
> >>> Need more IOPS for really hot data?
> >>> Add a cache tier or dedicated SSD pools for special needs/customers.
> >>>
> >>> Alternatively, do "classic" Ceph with 3x replication or EC coding,
> >>> but in either case (even more so with EC) you will need the most
> >>> firebreathing CPUs available, so compared to the above design it may
> >>> be a zero sum game cost wise, if not performance wise as well.
> >>> This leaves you with 960TB in the same space when doing 3x
replication.
> >>>
> >>> A middle of the road approach would be to use RAID1 or 10 based OSDs
> >>> to bring down the computational needs in exchange for higher storage
> >>> costs (effective 4x replication).
> >>> This only gives you 720TB, alas it will be easier (and cheaper CPU
> >>> cost
> >>> wise) to achieve peak performance with this approach compared to the
> >>> one above with 60 OSDs per node.
> >>>
> >>> Lastly, I give you this (and not being a fan of Fujitsu, mind):
> >>> http://www.fujitsu.com/global/products/computing/storage/eternus-
> cd/
> >>>
> >>> Christian
> >>>
> >>> On Mon, 30 May 2016 10:25:35 +0430 Jack Makenz wrote:
> >>>
> >>>> Forwarded conversation
> >>>> Subject: Wasting the Storage capacity when using Ceph based On
> >>>> high-end storage systems
> >>>> ------------------------
> >>>>
> >>>> From: *Jack Makenz* <jack.makenz@xxxxxxxxx>
> >>>> Date: Sun, May 29, 2016 at 6:52 PM
> >>>> To: ceph-community@xxxxxxxxxxxxxx
> >>>>
> >>>>
> >>>> Hello All,
> >>>> There are some serious problem about ceph that may waste storage
> >>> capacity
> >>>> when using high-end storage system(Hitachi, IBM, EMC, HP ,...) as
> >>>> back-end for OSD hosts.
> >>>>
> >>>> Imagine in the real cloud we need  *n Petabytes* of storage
> >>>> capacity that commodity hardware's hard disks or OSD server's hard
> >>>> disks can't provide this amount of storage capacity. thus we have
> >>>> to use storage systems as back-end for OSD hosts(to implement OSD
> daemons ).
> >>>>
> >>>> But because almost all of these storage systems ( Regardless of
> >>>> their
> >>>> brand) use Raid technology and also ceph replicate at least two
> >>>> copy of each Object, lot's amount of storage capacity waste.
> >>>>
> >>>> So is there any solution to solve this problem/misunderstand ?
> >>>>
> >>>> Regards
> >>>> Jack Makenz
> >>>>
> >>>> ----------
> >>>> From: *Nate Curry* <curry@xxxxxxxxxxxxx>
> >>>> Date: Mon, May 30, 2016 at 5:50 AM
> >>>> To: Jack Makenz <jack.makenz@xxxxxxxxx>
> >>>> Cc: Unknown <ceph-community@xxxxxxxxxxxxxx>
> >>>>
> >>>>
> >>>> I think that purpose of ceph is to get away from having to rely on
> >>>> high end storage systems and to be provide the capacity to utilize
> >>>> multiple less expensive servers as the storage system.
> >>>>
> >>>> That being said you should still be able to use the high end
> >>>> storage systems with or without RAID enabled.  You could do away
> >>>> with RAID altogether and let Ceph handle the redundancy or you can
> >>>> have LUNs assigned to hosts be put into use as OSDs.  You could
> >>>> make it work however but to get the most out of your storage with
> >>>> Ceph I think a non-RAID configuration would be best.
> >>>>
> >>>> Nate Curry
> >>>>
> >>>>> _______________________________________________
> >>>>> Ceph-community mailing list
> >>>>> Ceph-community@xxxxxxxxxxxxxx
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com
> >>>>>
> >>>>>
> >>>> ----------
> >>>> From: *Doug Dressler* <darbymorrison@xxxxxxxxx>
> >>>> Date: Mon, May 30, 2016 at 6:02 AM
> >>>> To: Nate Curry <curry@xxxxxxxxxxxxx>
> >>>> Cc: Jack Makenz <jack.makenz@xxxxxxxxx>, Unknown <
> >>>> ceph-community@xxxxxxxxxxxxxx>
> >>>>
> >>>>
> >>>> For non-technical reasons I had to run ceph initially using SAN
> >>>> disks.
> >>>>
> >>>> Lesson learned:
> >>>>
> >>>> Make sure deduplication is disabled on the SAN :-)
> >>>>
> >>>>
> >>>>
> >>>> ----------
> >>>> From: *Jack Makenz* <jack.makenz@xxxxxxxxx>
> >>>> Date: Mon, May 30, 2016 at 9:05 AM
> >>>> To: Nate Curry <curry@xxxxxxxxxxxxx>, ceph-
> community@xxxxxxxxxxxxxx
> >>>>
> >>>>
> >>>> Thanks Nate,
> >>>> But as i mentioned before , providing petabytes of storage capacity
> >>>> on commodity hardware or enterprise servers is almost impossible,
> >>>> of course that it's possible by installing hundreds of servers with
> >>>> 3 terabytes hard disks, but this solution waste data center raise
> >>>> floor, power consumption and also *money* :)
> >>>
> >>>
> >>> --
> >>> Christian Balzer        Network/Systems Engineer
> >>> chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
> >>> http://www.gol.com/
> >>
> >>
> >>
> >
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com