Re: Ceph for online file storage

Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> · Thu, 30 Jun 2016 11:16:31 +0200

hi Moïn,

two suggestions, based on my experience:

1. max HDD size of GOOD QUALITY 7200 RPM spinning SATA/SAS HDD's is 4 TB.

Everything else will ruin ur performance ( as long as you dont do pure
archiving of files ( writing one time, "never" touching them again )

If you have 8 TB HDDs, just use them for max. 50%

2. use ssd cache tier with ssd's which can sustain continouse IO
operations. Depending on the size of that cache tier you might be able
to use > 4 TB per 7200 RPM spinning HDD.

3. of course, but thats already quiet standard, use ssd's for the
journals and metadata ( take care to use the right ssd's for that )

Look at

https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

to have an idea what i mean.

Good luck !

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107

Am 30.06.2016 um 10:34 schrieb m.danai@xxxxxxxxxx:
> Thank you all for your prompt answers.
> 
>> firstly, wall of text, makes things incredibly hard to read.
>> Use paragraphs/returns liberally.
> 
> I actually made sure to use paragraphs. For some reason, the formatting was removed.
> 
>> Is that your entire experience with Ceph, ML archives and docs?
> 
> Of course not, I have already been through the whole documentation many times. It's just that I couldn't really decide between the choices I was given.
> 
>> What's an "online storage"?
>> I assume you're talking about what is is commonly referred as "cloud
> storage".
> 
> I try not to use the term "cloud", but if you must, then yes that's the idea behind it. Basically an online hard disk.
> 
>> 10MB is not a small file in my book, 1-4KB (your typical mail) are small
>> files.
>> How much data (volume/space) are you looking at initially and within a
>> year of deployment?
> 
> 10MB is small compared to the larger files, but it is indeed bigger that smaller, IOPS-intensive files (like the emails you pointed out).
> 
> Right now there are two servers, each with 12x8TB. I expect a growth rate of about the same size every 2-3 months.
> 
>> What usage patterns are you looking at, expecting?
> 
> Since my customers will put their files on this "cloud", it's generally write once, read many (or at least more reads than writes).
> As they most likely will store private documents, but some bigger files too, the smaller files are predominant.
> 
>> That's quite the blanket statement and sounds like from A sales brochure. 
>> SSDs for OSD journals are always a good idea.
>> Ceph scales first and foremost by adding more storage nodes and OSDs.
> 
> What I meant by scaling is that as the number of customers grows, the more small files there will be, and so in order to have decent performance at
> that point, SSDs are a must. I can add many OSDs, but if they are all struggling with IOPS then it's no use (except having more space).
> 
>> Are we talking about existing HW or what you're planning?
> 
> That is existing hardware. Given the high capacity of the drives, I went with a more powerful CPU to avoid myself future headaches.
> 
>> Also, avoid large variations in your storage nodes if anyhow possible,
> especially in your OSD sizes.
> 
> Say I have two nodes, one with 12 OSDs and  the other with 24. All drives are the same size. Would that cause any issue ? (except for the failure domain)
> 
> I think it is clear that native calls are the way to go, even the docs point you in that direction. Now the issue is that the clients needs to have a file directory structure.
> 
> The access topology is as follows:
> 
> Customer <-> customer application <-> server application <-> Ceph cluster
> 
> The customer has to be able to make directories, as with an FTP server for example. Using CephFS would make this task very easy, though at the expense of some performance.
> With natives calls, since everything is considered as an object, it gets trickier to provide this feature. Perhaps some naming scheme would make this possible.
> 
> Kind regards,
> 
> Moïn Danai.
> 
> ----Original Message----
> From : chibi@xxxxxxx
> Date : 27/06/2016 - 02:45 (CEST)
> To : ceph-users@xxxxxxxxxxxxxx
> Cc : m.danai@xxxxxxxxxx
> Subject : Re:  Ceph for online file storage
> 
> 
> Hello,
> 
> firstly, wall of text, makes things incredibly hard to read.
> Use paragraphs/returns liberally.
> 
> Secondly, what Yang wrote.
> 
> More inline.
> On Sun, 26 Jun 2016 18:30:35 +0000 (GMT+00:00) m.danai@xxxxxxxxxx wrote:
> 
>> Hi all,
>> After a quick review of the mailing list archive, I have a question that
>> is left unanswered: 
> 
> Is that your entire experience with Ceph, ML archives and docs?
> 
>> Is Ceph suitable for online file storage, and if
>> yes, shall I use RGW/librados or CephFS ? 
> 
> What's an "online storage"? 
> I assume you're talking about what is is commonly referred as "cloud
> storage".
> Which also typically tends to use HTTP, S3 and thus RGW would be the
> classic fit. 
> 
> But that's up to you really.
> 
> For example OwnCloud (and thus NextCloud) can use Ceph RGW as a storage
> backend. 
> 
>> The typical workload here is
>> mostly small files 50kB-10MB and some bigger ones 100MB+ up to 4TB max
>> (roughly 70/30 split). 
> 10MB is not a small file in my book, 1-4KB (your typical mail) are small
> files.
> How much data (volume/space) are you looking at initially and within a
> year of deployment?
> 
> What usage patterns are you looking at, expecting?
> 
>> Caching with SSDs is critical in achieving
>> scalable performance as OSD hosts increase (and files as well). 
> 
> That's quite the blanket statement and sounds like from A sales brochure. 
> SSDs for OSD journals are always a good idea.
> Ceph scales first and foremost by adding more storage nodes and OSDs.
> 
> SSD based cache-tiers (quite a different beast to journals) can help, but
> that's highly dependent on your usage patterns as well as correct sizing
> and configuration of the cache pool.
> 
> For example one of your 4TB files above could potentially wreck havoc with
> a cache pool of similar size.
> 
>> OSD
>> nodes have between 12 and 48 8TB drives. 
> 
> Are we talking about existing HW or what you're planning?
> 12 OSDs per node are a good start and what I aim for usually, 24 are
> feasible if you have some idea what you're doing.
> More than 24 OSDs per node requires quite the insight and significant
> investments in CPU and RAM. Tons of threads about this here.
> 
> Read the current thread "Dramatic performance drop at certain number of
> objects in pool" for example.
> 
> Also, avoid large variations in your storage nodes if anyhow possible,
> especially in your OSD sizes.
> 
> Christian
> 
>> If using CephFS, the hierarchy
>> would include alphabet letters at the root and then a user's directory
>> in the appropriate subfolder folder. With native calls, I'm not quite
>> sure on how to retrieve file A from user A and not user B. Note that the
>> software which processes user data is written in Java and deployed on
>> multiple client-facing servers, so rados integration should be easy.
>> Kind regards, Moïn Danai.
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com