Looking for some advise on distributed FS: Is Ceph the right option for me?

Jones de Andrade <johannesrs@xxxxxxxxx> · Tue, 10 Jul 2018 13:40:55 -0300

Hi all.

I'm looking for some information on several distributed filesystems for our application.

It
 looks like it finally came down to two candidates, Ceph being one of 
them. But there are still a few questions about ir that I would really 
like to clarify, if possible.

Our plan, 
initially on 6 workstations, is to have it hosting a distributed file 
system that can withstand two simultaneous computers failures without 
data loss (something that can remember a raid 6, but over the network). 
This file system will also need to be also remotely mounted (NFS server
 with fallbacks) by other 5+ computers. Students will be working on all 
11+ computers at the same time (different requisites from different 
softwares: some use many small files, other a few really big, 100s gb, 
files), and absolutely no hardware modifications are allowed. This 
initial test bed is for undergraduate students usage, but if successful
 will be employed also for our small clusters. The connection is a 
simple GbE.

Our actual concerns are:
1) 
Data Resilience: It seems that double copy of each block is the standard setting, is it correct? As such, it will strip-parity data among three computers for 
each block?

2) 
Metadata Resilience: We seen that we can now have more than a single 
Metadata Server (which was a show-stopper on previous versions). However, do they have to be dedicated boxes, or they 
can share boxes with the Data Servers? Can it be configured in such a 
way that even if two metadata server computers fail the whole system 
data will still be accessible from the remaining computers, without 
interruptions, or they share different data aiming only for performance?

3)
 Other softwares compability: We seen that there is NFS incompability, is it correct? 
Also, any posix issues?

4) No single (or 
double) point of failure: every single possible stance has to be able to
 endure a *double* failure (yes, things can get time to be fixed here). Does Ceph need s single master server for any of its activities? Can it endure 
double failure? How long would it take to any sort of "fallback" to be 
completed, users would need to wait to regain access?

I think that covers the initial questions we have. Sorry if this is the wrong list, however.

Looking forward for any answer or suggestion,

Regards,

Jones

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com