Re: Ceph for online file storage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you all for your prompt answers.

>firstly, wall of text, makes things incredibly hard to read.
>Use paragraphs/returns liberally.

I actually made sure to use paragraphs. For some reason, the formatting was removed.

>Is that your entire experience with Ceph, ML archives and docs?

Of course not, I have already been through the whole documentation many times. It's just that I couldn't really decide between the choices I was given.

>What's an "online storage"?
>I assume you're talking about what is is commonly referred as "cloud
storage".

I try not to use the term "cloud", but if you must, then yes that's the idea behind it. Basically an online hard disk.

>10MB is not a small file in my book, 1-4KB (your typical mail) are small
>files.
>How much data (volume/space) are you looking at initially and within a
>year of deployment?

10MB is small compared to the larger files, but it is indeed bigger that smaller, IOPS-intensive files (like the emails you pointed out).

Right now there are two servers, each with 12x8TB. I expect a growth rate of about the same size every 2-3 months.

>What usage patterns are you looking at, expecting?

Since my customers will put their files on this "cloud", it's generally write once, read many (or at least more reads than writes).
As they most likely will store private documents, but some bigger files too, the smaller files are predominant.

>That's quite the blanket statement and sounds like from A sales brochure. 
>SSDs for OSD journals are always a good idea.
>Ceph scales first and foremost by adding more storage nodes and OSDs.

What I meant by scaling is that as the number of customers grows, the more small files there will be, and so in order to have decent performance at
that point, SSDs are a must. I can add many OSDs, but if they are all struggling with IOPS then it's no use (except having more space).

>Are we talking about existing HW or what you're planning?

That is existing hardware. Given the high capacity of the drives, I went with a more powerful CPU to avoid myself future headaches.

>Also, avoid large variations in your storage nodes if anyhow possible,
especially in your OSD sizes.

Say I have two nodes, one with 12 OSDs and  the other with 24. All drives are the same size. Would that cause any issue ? (except for the failure domain)

I think it is clear that native calls are the way to go, even the docs point you in that direction. Now the issue is that the clients needs to have a file directory structure.

The access topology is as follows:

Customer <-> customer application <-> server application <-> Ceph cluster

The customer has to be able to make directories, as with an FTP server for example. Using CephFS would make this task very easy, though at the expense of some performance.
With natives calls, since everything is considered as an object, it gets trickier to provide this feature. Perhaps some naming scheme would make this possible.

Kind regards,

Moïn Danai.

----Original Message----
>From : chibi@xxxxxxx
Date : 27/06/2016 - 02:45 (CEST)
To : ceph-users@xxxxxxxxxxxxxx
Cc : m.danai@xxxxxxxxxx
Subject : Re:  Ceph for online file storage


Hello,

firstly, wall of text, makes things incredibly hard to read.
Use paragraphs/returns liberally.

Secondly, what Yang wrote.

More inline.
On Sun, 26 Jun 2016 18:30:35 +0000 (GMT+00:00) m.danai@xxxxxxxxxx wrote:

> Hi all,
> After a quick review of the mailing list archive, I have a question that
> is left unanswered: 

Is that your entire experience with Ceph, ML archives and docs?

>Is Ceph suitable for online file storage, and if
> yes, shall I use RGW/librados or CephFS ? 

What's an "online storage"? 
I assume you're talking about what is is commonly referred as "cloud
storage".
Which also typically tends to use HTTP, S3 and thus RGW would be the
classic fit. 

But that's up to you really.

For example OwnCloud (and thus NextCloud) can use Ceph RGW as a storage
backend. 

>The typical workload here is
> mostly small files 50kB-10MB and some bigger ones 100MB+ up to 4TB max
> (roughly 70/30 split). 
10MB is not a small file in my book, 1-4KB (your typical mail) are small
files.
How much data (volume/space) are you looking at initially and within a
year of deployment?

What usage patterns are you looking at, expecting?

>Caching with SSDs is critical in achieving
> scalable performance as OSD hosts increase (and files as well). 

That's quite the blanket statement and sounds like from A sales brochure. 
SSDs for OSD journals are always a good idea.
Ceph scales first and foremost by adding more storage nodes and OSDs.

SSD based cache-tiers (quite a different beast to journals) can help, but
that's highly dependent on your usage patterns as well as correct sizing
and configuration of the cache pool.

For example one of your 4TB files above could potentially wreck havoc with
a cache pool of similar size.

>OSD
> nodes have between 12 and 48 8TB drives. 

Are we talking about existing HW or what you're planning?
12 OSDs per node are a good start and what I aim for usually, 24 are
feasible if you have some idea what you're doing.
More than 24 OSDs per node requires quite the insight and significant
investments in CPU and RAM. Tons of threads about this here.

Read the current thread "Dramatic performance drop at certain number of
objects in pool" for example.

Also, avoid large variations in your storage nodes if anyhow possible,
especially in your OSD sizes.

Christian

>If using CephFS, the hierarchy
> would include alphabet letters at the root and then a user's directory
> in the appropriate subfolder folder. With native calls, I'm not quite
> sure on how to retrieve file A from user A and not user B. Note that the
> software which processes user data is written in Java and deployed on
> multiple client-facing servers, so rados integration should be easy.
> Kind regards, Moïn Danai.


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux